Re: Add 64-bit XIDs into PostgreSQL 15

2024-07-25 Thread Chris Travers
On Thu, Jul 25, 2024 at 5:19 PM Peter Eisentraut 
wrote:

> On 23.07.24 11:13, Aleksander Alekseev wrote:
> >> Here is the fix. It can be tested like this:
> >> [...]
> >
> > PFA the rebased patchset.
>
> I'm wondering about the 64-bit GUCs.
>
> At first, it makes sense that if there are settings that are counted in
> terms of transactions, and transaction numbers are 64-bit integers, then
> those settings should accept 64-bit integers.
>
> But what is the purpose and effect of setting those parameters to such
> huge numbers?  For example, what is the usability of being able to set
>
> vacuum_failsafe_age = 5000
>

Also in the rebased patch set I cannot find the above, so I cannot evaluate
what it does.

In the past I have pushed for some mechanism to produce warnings like we
currently have approaching xid wraparound when a certain threshold is met.
Is this that mechanism?

>
> I think in the world of 32-bit transaction IDs, you can intuitively
> interpret most of these "transaction age" settings as "percent toward
> disaster".  For example,
>
> vacuum_freeze_table_age = 15000
>
> is 7% toward disaster, and
>
> vacuum_failsafe_age = 16
>
> is 75% toward disaster.
>
> However, if there is no more disaster threshold at 2^31, what is the
> guidance for setting these?  Or more radically, why even run
> transaction-count-based vacuum at all?
>
> Conversely, if there is still some threshold (not disaster, but
> efficiency or something else), would it still be useful to keep these
> settings well below 2^31?  In which case, we might not need 64-bit GUCs.
>
> Your 0004 patch adds support for 64-bit GUCs but doesn't actually
> convert any existing GUCs to use that.  (Unlike the reloptions, which
> your patch coverts.)  And so there is no documentation about these
> questions.
>
>

-- 
Best Wishes,
Chris Travers

Efficito:  Hosted Accounting and ERP.  Robust and Flexible.  No vendor
lock-in.
http://www.efficito.com/learn_more


Re: commitfest.postgresql.org is no longer fit for purpose

2024-05-17 Thread Chris Travers
I think there are actually a number of factors that make this much harder.

On Fri, May 17, 2024 at 2:33 PM Heikki Linnakangas  wrote:
>
> On 17/05/2024 05:26, Robert Haas wrote:
> > For bonus points, suppose we make it so that when you click the link,
> > it takes you to a box where you can type in a text comment that will
> > display in the app, explaining why your patch needs review: "Robert
> > says the wire protocol changes in this patch are wrong, but I think
> > he's full of hot garbage and want a second opinion!" (a purely
> > hypothetical example, I'm sure) If you put in a comment like this when
> > you register your patch for the CommitFest, it gets a sparkly gold
> > border and floats to the top of the list, or we mail you a Kit-Kat
> > bar, or something. I don't know.
>
> Dunno about having to click a link or sparkly gold borders, but +1 on
> having a free-form text box for notes like that. Things like "cfbot says
> this has bitrotted" or "Will review this after other patch this depends
> on". On the mailing list, notes like that are both noisy and easily lost
> in the threads. But as a little free-form text box on the commitfest, it
> would be handy.
>
> One risk is that if we start to rely too much on that, or on the other
> fields in the commitfest app for that matter, we de-value the mailing
> list archives. I'm not too worried about it, the idea is that the
> summary box just summarizes what's already been said on the mailing
> list, or is transient information like "I'll get to this tomorrow"
> that's not interesting to archive.
>
> > The point is - we need a much better signal to noise ratio here. I bet
> > the number of patches in the CommitFest that actually need review is
> > something like 25% of the total. The rest are things that are just
> > parked there by a committer, or that the author doesn't care about
> > right now, or that are already being actively discussed, or where
> > there's not a clear way forward. We could create new statuses for all
> > of those states - "Parked", "In Hibernation," "Under Discussion," and
> > "Unclear" - but I think that's missing the point. What we really want
> > is to not see that stuff in the first place. It's a CommitFest, not
> > once-upon-a-time-I-wrote-a-patch-Fest.

Yeah this is a problem.

I think in cases here something is in hibernation or unclear it really
should be "returned with feedback."  There's really nothing stopping
someone from learning from the experience and resubmitting an improved
version.

I think under discussion is also rather unclear.  The current statuses
already cover this sort of thing (waiting for author and waiting for
review).  Maybe we could improve the categories here but it is
important to note whether the author or a reviewer is expected to take
the next step.

If the author doesn't respond within a period of time (let's say 30
days), I think we can just say "returned with feedback."

Since you can already attach older threads to a commitfest entry, it
seems to me that we ought to be more aggressive with "returned with
feedback" and note that this doesn't necessarily mean "rejected" which
is a separate status which we rarely use.
>
> Yeah, I'm also skeptical of adding new categories or statuses to the
> commitfest.
>
> I sometimes add patches to the commitfest that are not ready to be
> committed, because I want review on the general idea or approach, before
> polishing the patch to final state. That's also a fine use of the
> commitfest app. The expected workflow is to get some review on the
> patch, and then move it back to Waiting on Author.
>
> --
> Heikki Linnakangas
> Neon (https://neon.tech)
>
>
>




Re: Update docs for default value of fdw_tuple_cost

2024-01-03 Thread Chris Travers
The following review has been posted through the commitfest application:
make installcheck-world:  not tested
Implements feature:   not tested
Spec compliant:   not tested
Documentation:tested, passed

Good catch.  This is a trivial fix and so I hope we can just get it in right 
away.

The new status of this patch is: Ready for Committer


Re: Moving forward with TDE

2023-12-16 Thread Chris Travers
Hi,

I was re-reading the patches here  and there was one thing I didn't understand.

There are provisions for a separation of data encryption keys for primary and 
replica I see, and these share a single WAL key.

But if I am setting up a replica from the primary, and the primary is already 
encrypted, then do these forceably share the same data encrypting keys?  Is 
there a need to have (possibly in a follow-up patch) an ability to decrypt and 
re-encrypt in pg_basebackup (which would need access to both keys) or is this 
handled already and I just missed it?

Best Wishes,
Chris Travers

Re: UUID v7

2023-10-09 Thread Chris Travers
So I am in the process of reviewing the patch and hopefully can provide 
something there soon.

However I want to address in the mean time the question of timestamp functions. 
 I know that is outside the scope of this patch but I would be in favor of 
adding them generally, not just as an extension but eventually into core.  I 
understand (and generally agree with) the logic of not generally extracting 
timestamps from UUIDs or other such field,s but there are cases where it is 
really, really helpful to be able to do.  In particular when you are 
troubleshooting misbehavior, all information you can get is helpful.  And so 
extracting all of the subfields can be helpful.

The problem with putting this in an extension is that this is mostly useful 
when debugging systems (particularly larger distributed systems) and so the 
chances of it hitting a critical mass enough to be supported by all major cloud 
vendors is effectively zero.

So I am not asking for this to be included in this patch but I am saying I 
would love to see these sort of things contributed at some point to core.

Re: New PostgreSQL Contributors

2023-07-30 Thread Chris Travers
Congrats!

On Fri, Jul 28, 2023 at 10:29 PM Christoph Berg  wrote:

> The PostgreSQL contributors team has been looking over the community
> activity and, over the first half of this year, has been recognizing
> new contributors to be listed on
>
> https://www.postgresql.org/community/contributors/
>
> New PostgreSQL Contributors:
>
>   Ajin Cherian
>   Alexander Kukushkin
>   Alexander Lakhin
>   Dmitry Dolgov
>   Hou Zhijie
>   Ilya Kosmodemiansky
>   Melanie Plageman
>   Michael Banck
>   Michael Brewer
>   Paul Jungwirth
>   Peter Smith
>   Vigneshwaran C
>
> New PostgreSQL Major Contributors:
>
>   Julien Rouhaud
>   Stacey Haysler
>   Steve Singer
>
> Thank you all for contributing to PostgreSQL to make it such a great
> project!
>
> For the contributors team,
> Christoph
>
>
>


Re: Moving forward with TDE

2023-03-28 Thread Chris Travers
t; > Consider that there are standards groups which explicitly consider these
> attack
> > vectors and consider them important enough to require mitigations to
> address
> > those vectors. Do the end users of PG understand the attack vectors or
> why they
> > matter?  Perhaps not, but just because they can’t articulate the
> reasoning does
> > NOT mean that the attack vector doesn’t exist or that their environment
> is
> > somehow immune to it- indeed, as the standards bodies surely know, the
> opposite
> > is true- they’re almost certainly at risk of those attack vectors and
> therefore
> > the standards bodies are absolutely justified in requiring them to
> provide a
> > solution. Treating these users as unimportant because they don’t have
> the depth
> > of understanding that we do or that the standards body does is not
> helping
> > them- it’s actively driving them away from PG.
>
> Well, then who is going to explain them here, because I have not heard
> them yet.
>
> > The RLS arguments were that queries could expoose some of the
> underlying
> > data, but in summary, that was considered acceptable.
> >
> > This is an excellent point- and dovetails very nicely into my argument
> that
> > protecting primary data (what is provided by users and ends up in
> indexes and
> > heaps) is valuable even if we don’t (yet..) have protection for other
> parts of
> > the system. Reducing the size of the attack vector is absolutely useful,
> > especially when it’s such a large amount of the data in the system. Yes,
> we
> > should, and will, continue to improve- as we do with many features, but
> we
> > don’t need to wait for perfection to include this feature, just as with
> RLS and
> > numerous other features we have.
>
> The issue is that you needed a certain type of user with a certain type
> of access to break RLS, while for this, writing to PGDATA is the simple
> case for all the breakage, and the thing we are protecting with
> authentication.
>
> > > > We, as a community, are clearly losing value by lack of this
> > capability,
> > > if by
> > > > no other measure than simply the numerous users of the
> commercial
> > > > implementations feeling that they simply can’t use PG
> without this
> > > feature, for
> > > > whatever their reasoning.
> > >
> > > That is true, but I go back to my concern over useful feature
> vs.
> > check
> > > box.
> > >
> > > While it’s easy to label something as checkbox, I don’t feel we
> have been
> > fair
> >
> > No, actually, it isn't.  I am not sure why you are saying that.
> >
> > I’m confused as to what is required to label a feature as a “checkbox”
> feature
> > then. What did you us to make that determination of this feature?  I’m
> happy to
> > be wrong here.
>
> I don't see the point in me continuing to reply here.  You just seem to
> continue asking questions without actually thinking of what I am saying,
> and hope I get tired or something.
>
> --
>   Bruce Momjian  https://momjian.us
>   EDB  https://enterprisedb.com
>
>   Embrace your flaws.  They make you human, rather than perfect,
>   which you will never be.
>


-- 
Best Wishes,
Chris Travers

Efficito:  Hosted Accounting and ERP.  Robust and Flexible.  No vendor
lock-in.
http://www.efficito.com/learn_more


Re: Moving forward with TDE

2023-03-28 Thread Chris Travers
On Tue, Mar 28, 2023 at 5:02 AM Stephen Frost  wrote:

>
> > There's clearly user demand for it as there's a number of organizations
>> > who have forks which are providing it in one shape or another.  This
>> > kind of splintering of the community is actually an actively bad thing
>> > for the project and is part of what killed Unix, by at least some pretty
>> > reputable accounts, in my view.
>>
>> Yes, the number of commercial implementations of this is a concern.  Of
>> course, it is also possible that those commercial implementations are
>> meeting checkbox requirements rather than technical ones, and the
>> community has been hostile to check box-only features.
>
>
> I’ve grown weary of this argument as the other major piece of work it was
> routinely applied to was RLS and yet that has certainly been seen broadly
> as a beneficial feature with users clearly leveraging it and in more than
> some “checkbox” way.
>
> Indeed, it’s similar also in that commercial implementations were done of
> RLS while there were arguments made about it being a checkbox feature which
> were used to discourage it from being implemented in core.  Were it truly
> checkbox, I don’t feel we would have the regular and ongoing discussion
> about it on the lists that we do, nor see other tools built on top of PG
> which specifically leverage it. Perhaps there are truly checkbox features
> out there which we will never implement, but I’m (perhaps due to what my
> dad would call selective listening on my part, perhaps not) having trouble
> coming up with any presently. Features that exist in other systems that we
> don’t want?  Certainly. We don’t characterize those as simply “checkbox”
> though. Perhaps that’s in part because we provide alternatives- but that’s
> not the case here. We have no comparable way to have this capability as
> part of the core system.
>
> We, as a community, are clearly losing value by lack of this capability,
> if by no other measure than simply the numerous users of the commercial
> implementations feeling that they simply can’t use PG without this feature,
> for whatever their reasoning.
>

I also think this is something of a problem because very few requirements
are actually purely technical requirements, and I think the issue is that
in many cases there are ways around the lack of the feature.

So I would phrase this differently.  What is the value of doing this in
core?

This dramatically simplifies the question of setting up a PostgreSQL
environment that is properly protected with encryption at rest.  That in
itself is valuable.  Today you can accomplish something similar with
encrypted filesystems and encryption options in things like pgbackrest.
However these are many different pieces of a solution and missing up the
setup of any one of them can compromise the data.  Having a single point of
encryption and decryption means fewer opportunities to mess it up and that
means less risk.  This in turn makes it easier to settle on using
PostgreSQL.

There are certainly going to be those who approach encryption at rest as a
checkbox item and who don't really care if there are holes in it.  But
there are others who really should be concerned (and this is becoming a
bigger issue where data privacy, PCI-DSS, and other requirements may come
into play), and those need better tooling than we have.  I also think that
as data privacy becomes a larger issue, this will become a larger topic.

 Anyway, my contribution to that question.

Best Wishes,
Chris Travers

>
> Thanks,
>
> Stephen
>


-- 
Best Wishes,
Chris Travers

Efficito:  Hosted Accounting and ERP.  Robust and Flexible.  No vendor
lock-in.
http://www.efficito.com/learn_more


Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

2023-03-10 Thread Chris Travers
"Right, the improvement this patch gives to the heap is not the full 
motivation. Another motivation is the improvement it gives to TableAM API. Our 
current API implies that the effort on locating the tuple by tid is small. This 
is more or less true for the heap, where we just need to pin and lock the 
buffer. But imagine other TableAM implementations, where locating a tuple is 
more expensive."

Yeah. Our TableAM API is a very nice start to getting pluggable storage, but we 
still have a long ways to go to have an ability to really provide a wide 
variety of pluggable storage engines.

In particular, the following approaches are likely to have much more expensive 
tid lookups:
 - columnar storage (may require a lot of random IO to reconstruct a tuple)
 - index oriented storage (tid no longer physically locatable in the file via 
seek)
 - compressed cold storage like pg_ctyogen (again seek may be problematic).

To my mind I think the performance benefits are a nice side benefit, but the 
main interest I have on this is regarding improvements in the TableAM 
capabilities.  I cannot see how to do this without a lot more infrastructure.

Re: Moving forward with TDE

2023-03-06 Thread Chris Travers
The following review has been posted through the commitfest application:
make installcheck-world:  not tested
Implements feature:   not tested
Spec compliant:   not tested
Documentation:not tested

I have decided to write a review here in terms of whether we want this feature, 
and perhaps the way we should look at encryption as a project down the road, 
since I think this is only the beginning.  I am hoping to run some full tests 
of the feature sometime in coming weeks.  Right now this review is limited to 
the documentation and documented feature.

From the documentation, the primary threat model of TDE is to prevent 
decryption of data from archived wal segments (and data files), for example on 
a backup system.  While there are other methods around this problem to date, I 
think that this feature is worth pursuing for that reason.  I want to address a 
couple of reasons for this and then go into some reservations I have about how 
some of this is documented.

There are current workarounds to ensuring encryption at rest, but these have a 
number of problems.  Encryption passphrases end up lying around the system in 
various places.  Key rotation is often difficult.  And one mistake can easily 
render all efforts ineffective.  TDE solves these problems.  The overall design 
from the internal docs looks solid.  This definitely is something I would 
recommend for many users.

I have a couple small caveats though.  Encryption of data is a large topic and 
there isn't a one-size-fits-all solution to industrial or state requirements.  
Having all this key management available in PostgreSQL is a very good thing.  
Long run it is likely to end up being extensible, and therefore both more 
powerful and offering a wider range of choices for solution architects.  
Implementing encryption is also something that is easy to mess up.  For this 
reason I think it would be great if we had a standardized format for discussing 
encryption options that we could use going forward.  I don't think that should 
be held against this patch but I think we need to start discussing it now 
because it will be a bigger problem later.

A second caveat I have is that key management is a topic where you really need 
a good overview of internals in order to implement effectively.  If you don't 
know how an SSL handshake works or what is in a certificate, you can easily 
make mistakes in setting up SSL.  I can see the same thing happening here.  For 
example, I don't think it would be safe to leave the KEK on an encrypted 
filesystem that is decrypted at runtime (or at least I wouldn't consider that 
safe -- your appetite for risk may vary).

My proposal would be to have build a template for encryption options in the 
documentation.  This could include topics like SSL as well.  In such a template 
we'd have sections like "Threat model,"  "How it works," "Implementation 
Requirements" and so forth.  Again I don't think this needs to be part of the 
current patch but I think it is something we need to start thinking about now.  
Maybe after this goes in, I can present a proposed documentation patch.

I will also note that I don't consider myself to be very qualified on topics 
like encryption.  I can reason about key management to some extent but some 
implementation details may be beyond me.  I would hope we could get some extra 
review on this patch set soon.

Re: Add 64-bit XIDs into PostgreSQL 15

2022-11-29 Thread Chris Travers
On Tue, Nov 29, 2022 at 5:57 PM Bruce Momjian  wrote:

> On Tue, Nov 29, 2022 at 11:41:07AM -0500, Robert Haas wrote:
> > My argument is that removing xidStopLimit is totally fine, because it
> > only serves to stop the database. What to do about xidWarnLimit is a
> > slightly more complex question. Certainly it can't be left untouched,
> > because warning that we're about to shut down the database for lack of
> > allocatable XIDs is not sensible if there is no such lack and we
> > aren't going to shut it down. But I'm also not sure if the model is
> > right. Doing nothing for a long time and then warning in every
> > transaction when some threshold is crossed is an extreme behavior
> > change. Right now that's somewhat justified because we're about to hit
> > a brick wall at full speed, but if we remove the brick wall and
> > replace it with a gentle pelting with rotten eggs, it's unclear that a
> > similarly strenuous reaction is the right model. But that's also not
> > to say that we should do nothing at all.
>
> Yeah, we would probably need to warn on every 1 million transactions or
> something.
>
>
My proposal would be to make the threshold configurable and start warning
on every transaction after that.  There are a couple reasons to do that.

The first is that noisy warnings are extremely easy to see.  You get them
in cron emails, from psql, in the db logs etc.  Having them every million
makes them harder to catch.

The point here is not to ensure there are no problems, but to make sure
that an existing layer in the current swiss cheese model of safety doesn't
go away.  Will it stop all problems?  No.  But the current warning strategy
is effective, given how many times we hear of cases of people having to
take drastic action to avoid impending xid wraparound.

If someone has an insert only database and maye doesn't want to ever
freeze, they can set the threshold to -1 or something.  I would suggest
keeping the default as at 2 billion to be in line with existing limitations
and practices.  People can then adjust as they see fit.

Warning text might be something like "XID Lag Threshold Exceeded.  Is
autovacuum clearing space and keeping up?"


> --
>   Bruce Momjian  https://momjian.us
>   EDB  https://enterprisedb.com
>
> Embrace your flaws.  They make you human, rather than perfect,
> which you will never be.
>


Re: Add 64-bit XIDs into PostgreSQL 15

2022-11-29 Thread Chris Travers
On Mon, Nov 28, 2022 at 11:06 PM Peter Geoghegan  wrote:

> On Mon, Nov 28, 2022 at 1:52 PM Bruce Momjian  wrote:
> > I think the problem is that we still have bloat with 64-bit XIDs,
> > specifically pg_xact and pg_multixact files.  Yes, that bloat is less
> > serious, but it is still an issue worth reporting in the server logs,
> > though not serious enough to stop the server from write queries.
>
> That's definitely a big part of it.
>
> Again, I don't believe that the idea is fundamentally without merit.
> Just that it's not worth it, given that having more XID space is very
> much not something that I think fixes most of the problems. And given
> the real risk of serious bugs with something this invasive.


> I believe that it would be more useful to focus on just not getting
> into trouble in the first place, as well as on mitigating specific
> problems that lead to the system reaching xidStopLimit in practice. I
> don't think that there is any good reason to allow datfrozenxid to go
> past about a billion. When it does the interesting questions are
> questions about what went wrong, and how that specific failure can be
> mitigated in a fairly direct way.
>
> We've already used way to much "XID space runway", so why should using
> even more help? It might, I suppose, but it almost seems like a
> distraction to me, as somebody that wants to make things better for
> users in general. As long as the system continues to misbehave (in
> whatever way it happens to be misbehaving), why should any amount of
> XID space ever be enough?
>

So I think the problem is that PostgreSQL is becoming more and more
scalabile, hardware is becoming more capable, and certain use cases are
continuing to scale up.  Over time, we tend to find ourselves approaching
the end of the runway at ever higher velocities.  That's a problem that
will get significantly worse over time.

Of course, as I think we agree, the priorities should be (in order):
1.  Avoid trouble
2.  Recover from trouble early
3.  Provide more and better options for recovery.

I think 64bit xids are a very good idea, but they really fit in this bottom
tier.   Not being up against mathematical limits to the software when
things are going bad is certainly a good thing.  But I am really worried
about the attitude that this patch really avoids trouble because in many
cases, I don;t think it does and therefore I believe we need to make sure
we are not reducing visibility of underlying problems.

>
> I think that we'll be able to get rid of freezing in a few years time.
> But as long as we have freezing, we have these problems.
>
> --
> Peter Geoghegan
>


Re: Add 64-bit XIDs into PostgreSQL 15

2022-11-29 Thread Chris Travers
Hi;

I suppose I must not have been clear in what I am suggesting we do and
why.   I will try to answer specific points below and then restate what I
think the problem is, and what I think should be done about it.

On Mon, Nov 28, 2022 at 5:53 PM Robert Haas  wrote:

> On Sat, Nov 26, 2022 at 4:08 AM Chris Travers 
> wrote:
> > I didn't see any changes to pg_upgrade to make this change possible on
> upgrade.  Is that also outside of the scope of your patch set?  If so how
> is that continuity supposed to be ensured?
>
> The scheme is documented in their 0006 patch, in a README.XID file.
> I'm not entirely confident that it's the best design and have argued
> against it in the past, but it's not crazy.
>

Right.  Per previous discussion I thought there was some discussion of
allowing people to run with the existing behavior.I must have been
mistaken.  If that is off the table then pg_upgrade and runtime replication
checks don't matter.

>
> More generally, while I think there's plenty of stuff to be concerned
> about in this patch set and while I'm somewhat skeptical about the
> likelihood of its getting or staying committed, I can't really
> understand your concerns in particular. The thrust of your concern
> seems to be that if we allow people to get further behind, recovery
> will be more difficult. I'm not sure I see the problem. Suppose that
> we adopt this proposal and that it is bug-free. Now, consider a user
> who gets 8 billion XIDs behind. They probably have to vacuum pretty
> much every page in the database to do that, or least every page in the
> tables that haven't been vacuumed recently. But that would likely also
> be true if they were 800 million XIDs behind, as is possible today.
> The effort to catch up doesn't increase linearly with how far behind
> you are, and is always bounded by the DB size.
>

Right.  I agree with all of that.

>
> It is true that if the table is progressively bloating, it is likely
> to be more bloated by the time you are 8 billion XIDs behind than it
> was when you were 800 million XIDs behind. I don't see that as a very
> good reason not to adopt this patch, because you can bloat the table
> by an arbitrarily large amount while consuming only a small number of
> XiDs, even just 1 XID. Protecting against bloat is good, but shutting
> down the database when the XID age reaches a certain value is not a
> particularly effective way of doing that, so saying that we'll be
> hurting people by not shutting down the database at the point where we
> do so today doesn't ring true to me. I think that most people who get
> to the point of wraparound shutdown have workloads where bloat isn't a
> huge issue, because those who do start having problems with the bloat
> way before they run out of XIDs.
>

To be clear, I never suggested shutting down the database.  What I have
suggested is that repurposing the current approaching-xid-wraparound
warnings to start complaining loudly when a threshold is exceeded would be
helpful.   I think it makes sense to make that threshold configurable
especially if we eventually have people running bloat-free table structures.

There are two fundamental problems here.  The first is that if, as you say,
a table is progressively bloating and we are getting further and further
behind on vacuuming and freezing, something is seriously wrong and we need
to do something about it.  In these cases, I my experience is that
vacuuming and related tools tend to suffer degraded performance, and
determining how to solve the problem takes quite a bit more time than a
routine bloat issue would.  So what I am arguing against is treating the
problem just as a bloat issue.  If you get there due to vacuum being slow,
something else is wrong and you are probably going to have to find and fix
that as well in order to catch up.  At least that's my experience.

I don't object to the db continuing to run, allocate xids etc.  What I
object to is it doing so in silently where things are almost certainly
going very wrong.

>
> It would be entirely possible to add a parameter to the system that
> says "hey, you know we can keep running even if we're a shazillion
> XiDs behind, but instead shut down when we are behind by this number
> of XIDs." Then, if somebody wants to force an automatic shutdown at
> that point, they could, and I think that then the scenario you're
> worried about just can't happen any more . But isn't that a little bit
> silly? You could also just monitor how far behind you are and page the
> DBA when you get behind by more than a certain number of XIDs. Then,
> you wouldn't be risking a shutdown, and you'd still be able to stay on
> top of the XID ages of your tables.
>
> Philosophically, I disagree with the idea of shutting down the
> database completely in any situation in

Re: Add 64-bit XIDs into PostgreSQL 15

2022-11-26 Thread Chris Travers
Hi;

Trying to discuss where we are talking past eachother.

On Fri, Nov 25, 2022 at 9:38 AM Aleksander Alekseev <
aleksan...@timescale.com> wrote:

> Hi hackers,
>
> > I'm wondering whether the safest way to handle this is by creating a
> > new TAM called "heap64", so that all storage changes happens there.
>
> > Many current users see stability as one of the greatest strengths of
> > Postgres, so while I very much support this move, I wonder if this
> > gives us a way to have both stability and innovation at the same time?
>
> That would be nice.
>
> However from what I see TransactionId is a type used globally in
> PostgresSQL. It is part of structures used by TAM interface, used in
> WAL records, etc. So we will have to learn these components to work
> with 64-bit XIDs anyway and then start thinking about cases like: when
> a user runs a transaction affecting two tables, a heap32 one and
> heap64 one and we will have to figure out which tuples are visible and
> which are not. This perhaps is doable but the maintenance burden for
> the project will be too high IMO.
>
> It seems to me that the best option we can offer for the users looking
> for stability is to use the latest PostgreSQL version with 32-bit
> XIDs. Assuming these users care that much about this particular design
> choice of course.
>

I didn't see any changes to pg_upgrade to make this change possible on
upgrade.  Is that also outside of the scope of your patch set?  If so how
is that continuity supposed to be ensured?

Also related to that, I think you would have to have a check on streaming
replication that both instances use the same xid format (that you don't
accidently upgrade this somehow), since this is set per db cluster, right?

>
> > The whole project seems to just ignore basic, pertinent questions.
> > Questions like: why are we falling behind like this in the first
> > place? And: If we don't catch up soon, why should we be able to catch
> > up later on? Falling behind on freezing is still a huge problem with
> > 64-bit XIDs.
>
> Is the example I provided above wrong?
>
> """
> Consider the case when you run a slow OLAP query that takes 12h to
> complete and 100K TPS of fast OLTP-type queries on the same system.
> The fast queries will consume all 32-bit XIDs in less than 12 hours,
> while the OLAP query started 12 hours ago didn't finish yet and thus
> its tuples can't be frozen.
> """
>
> If it is, please let me know. I would very much like to know if my
> understanding here is flawed.
>

So, you have described a scenario we cannot support today (because xids
would be exhausted within 5.5 hours at that transactional rate).
Additionally as PostgreSQL becomes more capable, this sort of scale will
increasingly be within reach and that is an important point in favor of
this effort.

This being said, there is another set of xid wraparound cases which today
is much larger in number that I think would be hurt if this patchset were
to be accepted into Postgres without mitigating measures which you consider
out of bounds -- the cases like Mailchimp, Adjust, and the like.  This is
why I keep stressing this, and I don't think waiving away concerns about
use cases outside of the one you are focusing on is helpful, particularly
from those of us who have faced xid wraparounds in these cases in the
past.  In these cases, database teams are usually faced with an operational
emergency while tools like vacuum, pg_repack, etc are severely degraded due
to getting so far behind on freezing.  The deeper the hole, the harder it
will be to dig out of.

Every large-scale high-throughput database I have ever worked on had
long-running query alerts precisely because of the impact on vacuum and the
downstream performance impacts.   I would love to get to a point where this
wasn't necessary and maybe in a few specific workloads we might be there
very soon.  The effort you are engaging in here is an important part of the
path to get there, but let's not forget the people who today are facing xid
wraparounds due to vacuum problems and what this sort of set of changes
will mean for them.

-- 
> Best regards,
> Aleksander Alekseev
>
>
>


Re: Add 64-bit XIDs into PostgreSQL 15

2022-11-23 Thread Chris Travers
On Tue, Nov 22, 2022 at 10:01 AM Aleksander Alekseev <
aleksan...@timescale.com> wrote:

> Hi Chris,
>
> > Right now the way things work is:
> > 1.  Database starts throwing warnings that xid wraparound is approaching
> > 2.  Database-owning team initiates an emergency response, may take
> downtime or degradation of services as a result
> > 3.  People get frustrated with PostgreSQL because this is a reliability
> problem.
> >
> > What I am worried about is:
> > 1.  Database is running out of space
> > 2.  Database-owning team initiates an emergency response and takes more
> downtime to into a good spot
> > 3.  People get frustrated with PostgreSQL because this is a reliability
> problem.
> >
> > If that's the way we go, I don't think we've solved that much.  And as
> humans we also bias our judgments towards newsworthy events, so rarer, more
> severe problems are a larger perceived problem than the more routine, less
> severe problems.  So I think our image as a reliable database would suffer.
> >
> > An ideal resolution from my perspective would be:
> > 1.  Database starts throwing warnings that xid lag has reached severely
> abnormal levels
> > 2.  Database owning team initiates an effort to correct this, and does
> not take downtime or degradation of services as a result
> > 3.  People do not get frustrated because this is not a reliability
> problem anymore.
> >
> > Now, 64-big xids are necessary to get us there but they are not
> sufficient.  One needs to fix the way we handle this sort of problem.
> There is existing logic to warn if we are approaching xid wraparound.  This
> should be changed to check how many xids we have used rather than remaining
> and have a sensible default there (optionally configurable).
> >
> > I agree it is not vacuum's responsibility.  It is the responsibility of
> the current warnings we have to avoid more serious problems arising from
> this change.  These should just be adjusted rather than dropped.
>
> I disagree with the axiom that XID wraparound is merely a symptom and
> not a problem.
>

XID wraparound doesn't happen to healthy databases, nor does it happen to
databases actively monitoring this possibility.  The cases where it
happens, two circumstances are present:

1.  Autovacuum is stalled, and
2.  Monitoring is not checking for xid lag (which would be fixed by
autovacuum if it were running properly).

XID wraparound is downstream of those problems.   At least that is my
experience.  If you disagree, I would like to hear why.

Additionally those problems still will cause worse outages with this change
unless there are some mitigating measures in place.  If you don't like my
proposal, I would be open to other mitigating measures.  But I think there
need to be mitigating measures in a change like this.

>
> Using 32-bit XIDs was a reasonable design decision back when disk
> space was limited and disks were slow. The drawback of this approach
> is the need to do the wraparound but agaig back then it was a
> reasonable design choice. If XIDs were 64-bit from the beginning users
> could run one billion (1,000,000,000) TPS for 584 years without a
> wraparound. We wouldn't have it similarly as there is no wraparound
> for WAL segments. Now when disks are much faster and much cheaper
> 32-bit XIDs are almost certainly not a good design choice anymore.
> (Especially considering the fact that this particular patch mitigates
> the problem of increased disk consumption greatly.)
>

I agree that 64-bit xids are a good idea.  I just don't think that existing
safety measures should be ignored or reverted.

>
> Also I disagree with an argument that a DBA that doesn't monitor disk
> space would care much about some strange warnings in the logs. If a
> DBA doesn't monitor basic system metrics I'm afraid we can't help this
> person much.
>

The problem isn't just the lack of disk space, but the difficulty that
stuck autovacuum runs pose in resolving the issue.  Keep in mind that
everything you can do to reclaim disk space (vacuum full, cluster,
pg_repack) will be significantly slowed down by an extremely bloated
table/index comparison.  The problem is that if you are running out of disk
space, and your time to recovery much longer than expected, then you have a
major problem.  It's not just one or the other, but the combination that
poses the real risk here.

Now that's fine if you want to run a bloatless table engine but to my
knowledge none of these are production-ready yet.  ZHeap seems mostly
stalled.  Oriole is still experimental.  But with the current PostgreSQL
table structure.

A DBA can monitor disk space, but if the DBA is not also monitoring xid
lag, then by the time corrective action is taken it may be too late.


> I do agree that we could probably provide some additional help for the
> rest of the users when it comes to configuring VACUUM. This is indeed
> non-trivial. However I don't think this is in scope of this particular
> patchset. I suggest we keep the focus 

Re: Add 64-bit XIDs into PostgreSQL 15

2022-11-21 Thread Chris Travers
On Mon, Nov 21, 2022 at 10:40 AM Pavel Borisov 
wrote:

> > I have a very serious concern about the current patch set. as someone
> who has faced transaction id wraparound in the past.
> >
> > I can start by saying I think it would be helpful (if the other issues
> are approached reasonably) to have 64-bit xids, but there is an important
> piece of context in reventing xid wraparounds that seems missing from this
> patch unless I missed something.
> >
> > XID wraparound is a symptom, not an underlying problem.  It usually
> occurs when autovacuum or other vacuum strategies have unexpected stalls
> and therefore fail to work as expected.  Shifting to 64-bit XIDs
> dramatically changes the sorts of problems that these stalls are likely to
> pose to operational teams.  -- you can find you are running out of storage
> rather than facing an imminent database shutdown.  Worse, this patch delays
> the problem until some (possibly far later!) time, when vacuum will take
> far longer to finish, and options for resolving the problem are
> diminished.  As a result I am concerned that merely changing xids from
> 32-bit to 64-bit will lead to a smaller number of far more serious outages.
> >
> > What would make a big difference from my perspective would be to combine
> this with an inverse system for warning that there is a problem, allowing
> the administrator to throw warnings about xids since last vacuum, with a
> configurable threshold.  We could have this at two billion by default as
> that would pose operational warnings not much later than we have now.
> >
> > Otherwise I can imagine cases where instead of 30 hours to vacuum a
> table, it takes 300 hours on a database that is short on space.  And I
> would not want to be facing such a situation.
>
> Hi, Chris!
> I had a similar stance when I started working on this patch. Of
> course, it seemed horrible just to postpone the consequences of
> inadequate monitoring, too long running transactions that prevent
> aggressive autovacuum etc. So I can understand your point.
>
> With time I've got to a little bit of another view of this feature i.e.
>
> 1. It's important to correctly set monitoring, the cut-off of long
> transactions, etc. anyway. It's not the responsibility of vacuum
> before wraparound to report inadequate monitoring etc. Furthermore, in
> real life, this will be already too late if it prevents 32-bit
> wraparound and invokes much downtime in an unexpected moment of time
> if it occurs already. (The rough analogy for that is the machine
> running at 120mph turns every control off and applies full brakes just
> because the cooling liquid is low (of course there might be a warning
> previously, but anyway))
>

So I disagree with you on a few critical points here.

Right now the way things work is:
1.  Database starts throwing warnings that xid wraparound is approaching
2.  Database-owning team initiates an emergency response, may take downtime
or degradation of services as a result
3.  People get frustrated with PostgreSQL because this is a reliability
problem.

What I am worried about is:
1.  Database is running out of space
2.  Database-owning team initiates an emergency response and takes more
downtime to into a good spot
3.  People get frustrated with PostgreSQL because this is a reliability
problem.

If that's the way we go, I don't think we've solved that much.  And as
humans we also bias our judgments towards newsworthy events, so rarer, more
severe problems are a larger perceived problem than the more routine, less
severe problems.  So I think our image as a reliable database would suffer.

An ideal resolution from my perspective would be:
1.  Database starts throwing warnings that xid lag has reached severely
abnormal levels
2.  Database owning team initiates an effort to correct this, and does not
take downtime or degradation of services as a result
3.  People do not get frustrated because this is not a reliability problem
anymore.

Now, 64-big xids are necessary to get us there but they are not
sufficient.  One needs to fix the way we handle this sort of problem.
There is existing logic to warn if we are approaching xid wraparound.  This
should be changed to check how many xids we have used rather than remaining
and have a sensible default there (optionally configurable).

I agree it is not vacuum's responsibility.  It is the responsibility of the
current warnings we have to avoid more serious problems arising from this
change.  These should just be adjusted rather than dropped.


> 2. The checks and handlers for the event that is never expected in the
> cluster lifetime (~200 years at constant rate of 1e6 TPS) can be just
> dropped. Of course we still need to do automatic routine maintenance
> like cutting SLRU buffers (but with a much bigger interval if we have
> much disk space e.g.). But I considered that we either can not care
> what will be with cluster after > 200 years (it will be migrated many
> times before this, on many reasons not 

Re: Add 64-bit XIDs into PostgreSQL 15

2022-11-21 Thread Chris Travers
On Mon, Nov 21, 2022 at 12:25 PM Aleksander Alekseev <
aleksan...@timescale.com> wrote:

> Hi hackers,
>
> > > I have a very serious concern about the current patch set. as someone
> who has faced transaction id wraparound in the past.
> >
> > [...]
> >
> > I had a similar stance when I started working on this patch. Of
> > course, it seemed horrible just to postpone the consequences of
> > inadequate monitoring, too long running transactions that prevent
> > aggressive autovacuum etc. So I can understand your point.
> >
> > With time I've got to a little bit of another view of this feature i.e.
> >
> > 1. It's important to correctly set monitoring, the cut-off of long
> > transactions, etc. anyway. It's not the responsibility of vacuum
> > before wraparound to report inadequate monitoring etc. Furthermore, in
> > real life, this will be already too late if it prevents 32-bit
> > wraparound and invokes much downtime in an unexpected moment of time
> > if it occurs already. (The rough analogy for that is the machine
> > running at 120mph turns every control off and applies full brakes just
> > because the cooling liquid is low (of course there might be a warning
> > previously, but anyway))
> >
> > 2. The checks and handlers for the event that is never expected in the
> > cluster lifetime (~200 years at constant rate of 1e6 TPS) can be just
> > dropped. Of course we still need to do automatic routine maintenance
> > like cutting SLRU buffers (but with a much bigger interval if we have
> > much disk space e.g.). But I considered that we either can not care
> > what will be with cluster after > 200 years (it will be migrated many
> > times before this, on many reasons not related to Postgres even for
> > the most conservative owners). So the radical proposal is to drop
> > 64-bit wraparound at all. The most moderate one is just not taking
> > very much care that after 200 years we have more hassle than next
> > month if we haven't set up everything correctly. Next month's pain
> > will be more significant even if it teaches dba something.
> >
> > Big thanks for your view on the general implementation of this feature,
> anyway.
>
> I'm inclined to agree with Pavel on this one. Keeping 32-bit XIDs in
> order to intentionally trigger XID wraparound to indicate the ending
> disk space and/or misconfigured system (by the time when it's usually
> too late anyway) is a somewhat arguable perspective. It would be great
> to notify the user about the potential issues with the configuration
> and/or the fact that VACUUM doesn't catch up. But it doesn't mean we
> should keep 32-bit XIDs in order to achive this.
>

That's not what I am suggesting.  However I am saying that removing a
symptom of a problem so you get bit when you are in an even worse position
is a bad idea, and I would strenuously oppose including a patchset like
this without some sort of mitigating measures.

What I think should be added to address this concern is a GUC variable of
warn_max_xid_lag, and then change the logic which warns of impending xid
wraparound to logic that warns of xid lag reaching a value in excess of
this threshold.

Databases under load take a long time to correct problems and throwing new
problems onto DBA-land and saying "you figure out your monitoring" is not
something I want to support.  Again, this comes from experience facing xid
wraparound issues.  I have nothing against 64-bit xids.  But I think the
patch ought not to delay onset of visible symptoms, that instead the focus
should be on making sure that efforts to address those symptoms can be
handled using less extreme measures and at a bit of a more relaxed pace.


>
> --
> Best regards,
> Aleksander Alekseev
>
>
>


Re: Add 64-bit XIDs into PostgreSQL 15

2022-11-20 Thread Chris Travers
The following review has been posted through the commitfest application:
make installcheck-world:  not tested
Implements feature:   not tested
Spec compliant:   not tested
Documentation:not tested

I have a very serious concern about the current patch set. as someone who has 
faced transaction id wraparound in the past.

I can start by saying I think it would be helpful (if the other issues are 
approached reasonably) to have 64-bit xids, but there is an important piece of 
context in reventing xid wraparounds that seems missing from this patch unless 
I missed something.

XID wraparound is a symptom, not an underlying problem.  It usually occurs when 
autovacuum or other vacuum strategies have unexpected stalls and therefore fail 
to work as expected.  Shifting to 64-bit XIDs dramatically changes the sorts of 
problems that these stalls are likely to pose to operational teams.  -- you can 
find you are running out of storage rather than facing an imminent database 
shutdown.  Worse, this patch delays the problem until some (possibly far 
later!) time, when vacuum will take far longer to finish, and options for 
resolving the problem are diminished.  As a result I am concerned that merely 
changing xids from 32-bit to 64-bit will lead to a smaller number of far more 
serious outages.

What would make a big difference from my perspective would be to combine this 
with an inverse system for warning that there is a problem, allowing the 
administrator to throw warnings about xids since last vacuum, with a 
configurable threshold.  We could have this at two billion by default as that 
would pose operational warnings not much later than we have now.

Otherwise I can imagine cases where instead of 30 hours to vacuum a table, it 
takes 300 hours on a database that is short on space.  And I would not want to 
be facing such a situation.

The new status of this patch is: Waiting on Author


Re: recovering from "found xmin ... from before relfrozenxid ..."

2020-08-18 Thread Chris Travers
On Tue, Jul 14, 2020 at 12:28 AM Peter Geoghegan  wrote:

> On Mon, Jul 13, 2020 at 2:12 PM Robert Haas  wrote:
> > 1. There's nothing to identify the tuple that has the problem, and no
> > way to know how many more of them there might be. Back-patching
> > b61d161c146328ae6ba9ed937862d66e5c8b035a would help with the first
> > part of this.
>
> I am in favor of backpatching such changes in cases where senior
> community members feel that it could help with hypothetical
> undiscovered data corruption issues -- if they're willing to take
> responsibility for the change. It certainly wouldn't be the first
> time. A "defense in depth" mindset seems like the right one when it
> comes to data corruption bugs. Early detection is really important.
>
> > Moreover, not everyone is as
> > interested in an extended debugging exercise as they are in getting
> > the system working again, and VACUUM failing repeatedly is a pretty
> > serious problem.
>
> That's absolutely consistent with my experience. Most users want to
> get back to business as usual now, while letting somebody else do the
> hard work of debugging.
>

Also even if you do trace the problem you still have to recover.

And sometimes I have found latent corruption from times when dbs were
running on older versions and older servers, making debugging largely a
futile exercise.

>
> > Therefore, one of my colleagues has - at my request - created a couple
> > of functions called heap_force_kill() and heap_force_freeze() which
> > take an array of TIDs.
>
> > So I have these questions:
> >
> > - Do people think it would me smart/good/useful to include something
> > like this in PostgreSQL?
>
> I'm in favor of it.
>

+1

Would be worth extending it with some functions to grab rows that have
various TOAST oids too.

>
> > - If so, how? I would propose a new contrib module that we back-patch
> > all the way, because the VACUUM errors were back-patched all the way,
> > and there seems to be no advantage in making people wait 5 years for a
> > new version that has some kind of tooling in this area.
>
> I'm in favor of it being *possible* to backpatch tooling that is
> clearly related to correctness in a fundamental way. Obviously this
> would mean that we'd be revising our general position on backpatching
> to allow some limited exceptions around corruption. I'm not sure that
> this meets that standard, though. It's hardly something that we can
> expect all that many users to be able to use effectively.
>
> I may be biased, but I'd be inclined to permit it in the case of
> something like amcheck, or pg_visibility, on the grounds that they're
> more or less the same as the new VACUUM errcontext instrumentation you
> mentioned. The same cannot be said of something like this new
> heap_force_kill() stuff.
>
> > - Any ideas for additional things we should include, or improvements
> > on the sketch above?
>
> Clearly you should work out a way of making it very hard to
> accidentally (mis)use. For example, maybe you make the functions check
> for the presence of a sentinel file in the data directory.
>

Agreed.

>
>
> --
> Peter Geoghegan
>
>
>

-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: Making CASE error handling less surprising

2020-07-26 Thread Chris Travers
On Fri, Jul 24, 2020 at 7:18 PM Tom Lane  wrote:

> Robert Haas  writes:
> > Like Pavel, and I think implicitly Dagfinn and Andres, I'm not sure I
> > believe this. Pavel's example is a good one. The leakproof exception
> > helps, but it doesn't cover everything. Users I've encountered throw
> > things like date_trunc() and lpad() into SQL code and expect them to
> > behave (from a performance point of view) like constants, but they
> > also expect 1/0 not to get evaluated too early when e.g. CASE is used.
> > It's difficult to meet both sets of expectations at the same time and
> > we're probably never going to have a perfect solution, but I think
> > you're minimizing the concern too much here.
>
> I've quoted this point before, but ... we can make queries arbitrarily
> fast, if we don't have to give the right answer.  I think we've seen
> enough complaints on this topic now to make it clear that what we're
> doing today with CASE is the wrong answer.
>

So here's my concern in a little more detail.

For small databases, these performance concerns are not big deals. But for
large, heavily loaded databases one tends to run into all of the
pathological cases more frequently.  In other words the overhead for the
largest users will likely not be proportional to the gains of the newer
users who are surprised by the current behavior.  The more complex we make
exceptions as to how the planner works, the more complex the knowledge
required to work on the high end of the database is.  So the complexity
here is such that I just don't think is worth it.


> The performance argument can be made to cut both ways, too.  If somebody's
> got a very expensive function in a CASE arm that they don't expect to
> reach, having it be evaluated anyway because it's got constant inputs
> isn't going to make them happy.
>

However in this case we would be evaluating the expensive case arm every
time it is invoked (i.e. for every row matched), right?  It is hard to see
this as even being close to a performance gain or even approximately
neutral because the cases where you have a significant gain are likely to
be extremely rare, and the penalties for when the cost applies will be many
multiples of the maximum gain.

>
> The real bottom line is: if you don't want to do this, how else do
> you want to fix the problem?  I'm no longer willing to deny that
> there is a problem.
>

I see three ways forward.

The first (probably the best) would be a solution along the lines of yours
along with a session-level GUC variable which could determine whether case
branches can fold constants.  This has several important benefits:

1.  It gets a fix in shortly for those who want it.
2.  It ensures this is optional behavior for the more experienced users
(where one can better decide which direction to go), and
3.  It makes the behavior explicit, documented, and thus more easily
understood.

A third approach would be to allow some sort of "constant evaluation
mechanism" maybe with its own memory context where constants could be
cached on first evaluation under the statement memory context.  That would
solve the problem more gneerally.


>
> > I don't think I believe this either. I don't think an average user is
> > going to expect  to behave differently from (SELECT
> > ).
>
> Agreed, that's poorly (or not at all?) documented.  But it's been
> true all along, and this patch isn't changing that behavior at all.
> I'm not sure if we should do anything more than improve the docs,
> but in any case it seems independent of the CASE issue.
>
> > The current behavior isn't great, but at least it handles these
> > cases consistently.
>
> Really?
>
> regards, tom lane
>
>
>

-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: Making CASE error handling less surprising

2020-07-24 Thread Chris Travers
On Fri, Jul 24, 2020 at 4:35 AM Tom Lane  wrote:

> Andres Freund  writes:
> > I'm a bit worried about a case like:
>
> > CREATE FUNCTION yell(int, int)
> > RETURNS int
> > IMMUTABLE
> > LANGUAGE SQL AS $$
> >SELECT CASE WHEN $1 != 0 THEN 17 / $2 ELSE NULL END
> > $$;
>
> > EXPLAIN SELECT yell(g.i, 0) FROM generate_series(1, 10) g(i);
>
> > I don't think the parameters here would have been handled before
> > inlining, right?
>
> Ah, I see what you mean.  Yeah, that throws an error today, and it
> still would with the patch I was envisioning (attached), because
> inlining does Param substitution in a different way.  I'm not
> sure that we could realistically fix the inlining case with this
> sort of approach.
>
> I think this bears out the comment I made before that this approach
> still leaves us with a very complicated behavior.  Maybe we should
> stick with the previous approach, possibly supplemented with a
> leakproofness exception.
>


I am actually not so sure this is a good idea. Here are two doubts I have.

1.  The problem of when a given SQL expression is evaluated crops up in a
wide variety of different contexts and, worst case, causes far more damage
than queries which always error.  Removing the lower hanging fruit while
leaving cases like:

select lock_foo(id), * from foo where somefield > 100; -- which rows does
lock_foo(id) run on?  Does it matter?

is going to legitimize these complaints in a way which will be very hard to
do unless we also want to eventually be able to specify when volatile
functions may be run. The two cases don't look the same but they are
manifestations of the same problem which is that when you execute a SQL
query you have no control over when expressions are actually run.

2.  The refusal to fold immutables within case statements here mean either
we do more tricks to get around the planner if we hit a pathological cases
in performance.  I am not convinced this is a net win.

If we go this route, would it be too much to ask to allow a GUC variable to
preserve the old behavior?


> regards, tom lane
>
>

-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: Internal key management system

2020-02-02 Thread Chris Travers
llow the internal usage to have a fixed output length of
> > > > LEN(IV) + LEN(HMAC) + LEN(DATA) = 16 + 32 + 64 = 112 bytes.
> > >
> > > Probably you meant LEN(DATA) is 32? DATA will be an encryption key for
> > > AES256 (master key) internally generated.
> >
> > No it should be 64-bytes. That way we can have separate 32-byte
> > encryption key (for AES256) and 32-byte MAC key (for HMAC-SHA256).
> >
> > While it's common to reuse the same 32-byte key for both AES256 and an
> > HMAC-SHA256 and there aren't any known issues with doing so, when
> > designing something from scratch it's more secure to use entirely
> > separate keys.
>
> The HMAC key you mentioned above is not the same as the HMAC key
> derived from the user provided passphrase, right? That is, individual
> key needs to have its IV and HMAC key. Given that the HMAC key used
> for HMAC(IV || ENCRYPT(KEY, IV, DATA)) is the latter key (derived from
> passphrase), what will be the former key used for?
>
> >
> > > > For the user facing piece, padding would enabled to support arbitrary
> > > > input data lengths. That would make the output length grow by up to
> > > > 16-bytes (rounding the data length up to the AES block size) plus one
> > > > more byte if a version field is added.
> > >
> > > I think the length of padding also needs to be added to the output.
> > > Anyway, in the first version the same methods of wrapping/unwrapping
> > > key are used for both internal use and user facing function. And user
> > > input key needs to be a multiple of 16 bytes value.
> >
> > A separate length field does not need to be added as the
> > padding-enabled output will already include it at the end[1]. This
> > would be handled automatically by the OpenSSL encryption / decryption
> > operations (if it's enabled):
> >
>
> Yes, right.
>
> Regards,
>
> [1]
> https://www.postgresql.org/message-id/031401d3f41d%245c70ed90%241552c8b0%24%40lab.ntt.co.jp
> [2]
> https://www.postgresql.org/message-id/CAD21AoD8QT0TWs3ma-aB821vwDKa1X519y1w3yrRKkAWjhZcrw%40mail.gmail.com
>
> --
> Masahiko Sawadahttp://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
>
>

-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: ERROR: tuple concurrently updated when modifying privileges

2019-05-14 Thread Chris Travers
On Tue, May 14, 2019 at 9:11 AM Michael Paquier  wrote:

> On Tue, May 14, 2019 at 08:08:05AM +0200, Chris Travers wrote:
> > Having thought about this a bit, I think the best solution would be to
> have
> > grant take out an access share lock to the tables granted.  This would
> > prevent concurrent alter table operations from altering the schema
> > underneath the grant as well, and thus possibly cause other race
> conditions.
> >
> > Any thoughts?
>
> "tuple concurrently updated" is an error message which should never be
> user-facing, and unfortunately there are many scenarios where it can
> be triggered by playing with concurrent DDLs:
> https://postgr.es/m/20171228063004.gb6...@paquier.xyz
>
> If you have an idea of patch, could you write it?  Having an isolation
> test case would be nice as well.
>

I will give Nick a chance to do the patch if he wants it (I have reached
out).  Otherwise sure.

I did notice one more particularly exotic corner case that is not resolved
by this proposed fix.

If you have two transactions with try to grant onto the same pg entity
(table etc) *both* will typically fail on the same error.

I am not sure that is a bad thing because I am not sure how concurrent
grants are supposed to work with MVCC but I think that would require a
fundamentally different approach.


> --
> Michael
>


-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: ERROR: tuple concurrently updated when modifying privileges

2019-05-14 Thread Chris Travers
On Tue, Apr 30, 2019 at 11:26 AM nickb  wrote:

> Hello, hackers
>
> we witnessed this slightly misleading error in production and it took us a
> while to figure out what was taking place.
> Below are reproduction steps:
>
>
> -- setup
> create table trun(cate int4);
>
> -- session 1
> begin;
> truncate table trun;
>
> -- session 2
> grant insert on table trun to postgres;
>
> -- session 1
> end;
>
> -- session 2:
> ERROR: XX000: tuple concurrently updated
> LOCATION: simple_heap_update, heapam.c:4474
>
> Apparently the tuple in question is the pg_class entry of the table being
> truncated. I didn't look too deep into the cause, but I'm certain the error
> message could be improved at least.
>

Having thought about this a bit, I think the best solution would be to have
grant take out an access share lock to the tables granted.  This would
prevent concurrent alter table operations from altering the schema
underneath the grant as well, and thus possibly cause other race conditions.

Any thoughts?

>
> Regards,
> Nick.
>
>
>

-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: PostgreSQL pollutes the file system

2019-04-12 Thread Chris Travers
On Fri, Apr 12, 2019 at 3:20 PM Alvaro Herrera 
wrote:

> On 2019-Apr-12, Daniel Gustafsson wrote:
>
> > There are many good reasons for the changes proposed in this thread, but
> I'm
> > not sure if discoverability is one.  Relying on autocompleting a
> filename to
> > figure out existing tooling for database maintenance and DBA type
> operations
> > seems like a fragile usecase.
> >
> > If commandline discoverability is of importance, providing a summary of
> the
> > tools in "man postgresql" seems like a better option.
>
> The first comment in the LWN article:
>  "It's broken and obviously a bad idea but we've been doing it for so long
> we
>   shouldn't attempt to fix it"
>
> IMO the future is longer than the past, and has more users, so let's do
> it right instead of perpetuating the mistakes.
>

I agree we should look at fixing these.  However I have two concerns.

1. naming things is surprisingly hard.  How sure are we that we can do this
right?  Can we come up with a correct name for initdb?  Maybe
pg_createcluster?
2. How long would our deprecation cycle be?  5 years?  10 years?  Given
that people may need to support multiple versions I would propose no
warnings until both formats are supported, then warnings for 2 years, then
drop the old ones.

>
>
> ... unless you think PostgreSQL is going to become irrelevant before
> 2050.
>
> --
> Álvaro Herrera    https://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: Berserk Autovacuum (let's save next Mandrill)

2019-04-10 Thread Chris Travers
On Sat, Apr 6, 2019 at 9:56 AM Darafei "Komяpa" Praliaskouski 
wrote:

> The invoking autovacuum on table based on inserts, not only deletes
>> and updates, seems good idea to me. But in this case, I think that we
>> can not only freeze tuples but also update visibility map even when
>> setting all-visible. Roughly speaking  I think vacuum does the
>> following operations.
>>
>> 1. heap vacuum
>> 2. HOT pruning
>> 3. freezing tuples
>> 4. updating visibility map (all-visible and all-frozen)
>> 5. index vacuum/cleanup
>> 6. truncation
>>
>> With the proposed patch[1] we can control to do 5 or not. In addition
>> to that, another proposed patch[2] allows us to control 6.
>>
>
> [1] is committed, [2] nears commit. Seems we have now all the infra to
> teach autovacuum to run itself based on inserts and not hurt anybody?
>
> ...
>
>> [1] https://commitfest.postgresql.org/22/1817/
>> [2] https://commitfest.postgresql.org/22/1981/
>>
>
>
Reading the thread and the patch, I generally agree that:
1.  With the current infrastructure having auto vacuum periodically scan
append-only tables for freezing would be good, and
2.  I can't think of any cases where this would be a bad thing.

Also I am not 100% convinced that the problems are avoidable by setting the
wraparound prevention thresholds low enough.  In cases where one is doing
large bulk inserts all the time, vacuum freeze could have a lot of work to
do, and in some cases I could imagine IO storms making that difficult.

I plan to run some benchmarks on this to try to assess performance impact
of this patch in standard pgbench scenarios.I will also try to come up with
some other benchmarks in append only workloads.


>
> --
> Darafei Praliaskouski
> Support me: http://patreon.com/komzpa
>


-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Proposal: autovacuum_max_queue_depth

2019-03-29 Thread Chris Travers
Hi everyone.  I would like to flesh this out in terms fo feedback before
creating a patch.

The Problem

In large production systems often you can have problems when autovacuum is
not tuned aggressively enough.  This leads to long autovacuum runs when
they happen, and autovacuum ends up eventually causing problems.  A major
difficulty is that you cannot just make autovacuum more aggressive because
then you get very long autovacuum queues, which means very hot tables might
not be vacuumed before they end up being close to unusable.

Example:

Imagine you have a 2TB database with 6000 tables with volumes ranging from
a few MB to 100GB in size per table.  You tune autovacuum to make it more
aggressive and know you can handle 5 in parallel.  So you set
autovacuum_vacuum_scale_factor to a much lower value.

On the next autovacuum run, autovacuum detects that 3000 tables need to be
vacuumed, and so creates 5 queues of 600 tables each.  Nothing gets added
to this queue until a queue completely empties.

To my experience I have not seen a case where analyze poses the same
problem but my solution would fold this in.

Current workarounds.

1.  Periodically kill autovacuum sessions, forcing queue recalculation.
2.  Manually prevacuum everything that exceeds desired thresholds.


Proposed Solution

I would propose a new GUC variable, autovacuum_max_queue_depth, defaulting
to 0 (no limit).

When autovacuum starts a run, it would sort the tables according to the
following formula if n_dead_tup > 0:

((n_dead_tup - autovac_threshold) / (n_dead_tup + n_live_tup) -
(autovacuum_scale_factor * (n_dead_tup)/(n_live_tup + n_dead_tup))

For analyze runs, n_dead_tup would have number of inserts since last
analyzed added to it.

Then the top rows numbering  autovacuum_max_queue_depth would be added to
each autovacuum queue.

In the scenario presented above, if autovacuum_max_queue_depth were to be
set to, say, 10, this would mean that after vacuuming 10 tables, each
autovacuum worker would exit, and be started the next time autovacuum would
wake up.

The goal here is to ensure that very hot tables rise to the top of the
queue and are vacuumed frequently even after setting Autovacuum to be far
more aggressive on a large production database.

Thoughts?  Feedback?  Waiting for a patch?

-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: PostgreSQL pollutes the file system

2019-03-21 Thread Chris Travers
On Thu, Mar 21, 2019 at 1:49 AM Michael Paquier  wrote:

> On Thu, Mar 21, 2019 at 08:41:32AM +0900, Tatsuo Ishii wrote:
> >> Can someone describe a scenario where this (name of the binary not
> >> clearly indicating it's related postgres) causes issues in practice? On
> >> my system, there are ~1400 binaries in /usr/bin, and for the vast
> >> majority of them it's rather unclear where do they come from.
>
> Naming conflict because our binary names are too generic?  createdb
> could for example be applied to any database, and not only Postgres.
> (I have 1600 entries in /usr/bin on a Debian installation.)
>

I generally agree with Tom that there is sufficient precedence here that we
don't need to worry about these conflicts per se.  However I would add two
points where we might want to think:

1.  createuser/dropuser are things that I don't consider good ways of
creating users anyway.  I think we should just consider removing these
binaries.  The SQL queries are better, more functional, and can be rolled
back as a part of a larger transaction.

2.  initdb is not so much of a pressing issue but I think despite the
longer string, pg_ctl -D mydatadir init [options] would be clearer from a
new user perspective and pose less cognitive load.

>
> >>
> >> But it's not really an issue, because we have tools to do that
> >>
> >> 1) man
> >>
> >> 2) -h/--help
> >>
> >> 3) rpm -qf $file (and similarly for other packagers)
> >>
> >> 4) set --prefix to install binaries so separate directory (which some
> >> distros already do anyway)
> >>
> >> So to me this seems like a fairly invasive change (potentially breaking
> >> quite a few scripts/tools) just to address a minor inconvenience.
> >
> > +1.
>
> Yes, +1.
> --
> Michael
>


-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: PostgreSQL pollutes the file system

2019-03-20 Thread Chris Travers
On Wed, Mar 20, 2019 at 11:06 AM Andreas Karlsson  wrote:

> On 3/19/19 11:19 AM, Fred .Flintstone wrote:
> > PostgreSQL pollutes the file system with lots of binaries that it is
> > not obvious that they belong to PostgreSQL.
> >
> > Such as "/usr/bin/createdb", etc.
> >
> > It would be better if these files were renamed to be prefixed with
> > pg_, such as pg_createdb.
> > Or even better postgresql-createdb then be reachable by through a
> > "postgresql" wrapper script.
>
> Hi,
>
> This topic has been discussed before e.g. in 2008 in
> https://www.postgresql.org/message-id/47EA5CC0.8040102%40sun.com and
> also more recently but I cannot find it in the archives right now.
>
> I am personally in favor of renaming e.g. createdb to pg_createdb, since
> it is not obvious that createdb belongs to PostgreSQL when reading a
> script or looking in /usr/bin, but we would need a some kind of
> deprecation cycle here or we would suddenly break tons of people's scripts


I wouldn't be opposed to this, but I would note two points on a deprecation
cycle:
1  Given that people may have tools that work with all supported versions
of PostgreSQL, this needs to be a long cycle, and
2. Managing that cycle makes it a little bit of a tough sell.

> .
>
> And as for the git-like solution with a wrapper script, that seems to be
> the modern way to do things but would be an even larger breakage and I
> am not convinced the advantage would be worth it especially since our
> executables are not as closely related and consistent as for example git's.
>

Git commands may be related, but I would actually argue that git commands
have a lot of inconsistency because of this structure,

See, for example, http://stevelosh.com/blog/2013/04/git-koans/


>
> Andreas
>
>

-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: [HACKERS] Custom compression methods

2019-03-19 Thread Chris Travers
On Tue, Mar 19, 2019 at 12:19 PM Tomas Vondra 
wrote:

>
> On 3/19/19 10:59 AM, Chris Travers wrote:
> >
> >
> > Not discussing whether any particular committer should pick this up but
> > I want to discuss an important use case we have at Adjust for this sort
> > of patch.
> >
> > The PostgreSQL compression strategy is something we find inadequate for
> > at least one of our large deployments (a large debug log spanning
> > 10PB+).  Our current solution is to set storage so that it does not
> > compress and then run on ZFS to get compression speedups on spinning
> disks.
> >
> > But running PostgreSQL on ZFS has some annoying costs because we have
> > copy-on-write on copy-on-write, and when you add file fragmentation... I
> > would really like to be able to get away from having to do ZFS as an
> > underlying filesystem.  While we have good write throughput, read
> > throughput is not as good as I would like.
> >
> > An approach that would give us better row-level compression  would allow
> > us to ditch the COW filesystem under PostgreSQL approach.
> >
> > So I think the benefits are actually quite high particularly for those
> > dealing with volume/variety problems where things like JSONB might be a
> > go-to solution.  Similarly I could totally see having systems which
> > handle large amounts of specialized text having extensions for dealing
> > with these.
> >
>
> Sure, I don't disagree - the proposed compression approach may be a big
> win for some deployments further down the road, no doubt about it. But
> as I said, it's unclear when we get there (or if the interesting stuff
> will be in some sort of extension, which I don't oppose in principle).
>

I would assume that if extensions are particularly stable and useful they
could be moved into core.

But I would also assume that at first, this area would be sufficiently
experimental that folks (like us) would write our own extensions for it.


>
> >
> > But hey, I think there are committers working for postgrespro, who
> might
> > have the motivation to get this over the line. Of course, assuming
> that
> > there are no serious objections to having this functionality or how
> it's
> > implemented ... But I don't think that was the case.
> >
> >
> > While I am not currently able to speak for questions of how it is
> > implemented, I can say with very little doubt that we would almost
> > certainly use this functionality if it were there and I could see plenty
> > of other cases where this would be a very appropriate direction for some
> > other projects as well.
> >
> Well, I guess the best thing you can do to move this patch forward is to
> actually try that on your real-world use case, and report your results
> and possibly do a review of the patch.
>

Yeah, I expect to do this within the next month or two.


>
> IIRC there was an extension [1] leveraging this custom compression
> interface for better jsonb compression, so perhaps that would work for
> you (not sure if it's up to date with the current patch, though).
>
> [1]
>
> https://www.postgresql.org/message-id/20171130182009.1b492eb2%40wp.localdomain
>
> Yeah I will be looking at a couple different approaches here and reporting
back. I don't expect it will be a full production workload but I do expect
to be able to report on benchmarks in both storage and performance.


>
> regards
>
> --
> Tomas Vondra  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: [HACKERS] Custom compression methods

2019-03-19 Thread Chris Travers
On Mon, Mar 18, 2019 at 11:09 PM Tomas Vondra 
wrote:

>
>
> On 3/15/19 12:52 PM, Ildus Kurbangaliev wrote:
> > On Fri, 15 Mar 2019 14:07:14 +0400
> > David Steele  wrote:
> >
> >> On 3/7/19 11:50 AM, Alexander Korotkov wrote:
> >>> On Thu, Mar 7, 2019 at 10:43 AM David Steele  >>> <mailto:da...@pgmasters.net>> wrote:
> >>>
> >>> On 2/28/19 5:44 PM, Ildus Kurbangaliev wrote:
> >>>
> >>>  > there are another set of patches.
> >>>  > Only rebased to current master.
> >>>  >
> >>>  > Also I will change status on commitfest to 'Needs review'.
> >>>
> >>> This patch has seen periodic rebases but no code review that I
> >>> can see since last January 2018.
> >>>
> >>> As Andres noted in [1], I think that we need to decide if this
> >>> is a feature that we want rather than just continuing to push it
> >>> from CF to CF.
> >>>
> >>>
> >>> Yes.  I took a look at code of this patch.  I think it's in pretty
> >>> good shape.  But high level review/discussion is required.
> >>
> >> OK, but I think this patch can only be pushed one more time, maximum,
> >> before it should be rejected.
> >>
> >> Regards,
> >
> > Hi,
> > in my opinion this patch is usually skipped not because it is not
> > needed, but because of its size. It is not hard to maintain it until
> > commiters will have time for it or I will get actual response that
> > nobody is going to commit it.
> >
>
> That may be one of the reasons, yes. But there are other reasons, which
> I think may be playing a bigger role.
>
> There's one practical issue with how the patch is structured - the docs
> and tests are in separate patches towards the end of the patch series,
> which makes it impossible to commit the preceding parts. This needs to
> change. Otherwise the patch size kills the patch as a whole.
>
> But there's a more important cost/benefit issue, I think. When I look at
> patches as a committer, I naturally have to weight how much time I spend
> on getting it in (and then dealing with fallout from bugs etc) vs. what
> I get in return (measured in benefits for community, users). This patch
> is pretty large and complex, so the "costs" are quite high, while the
> benefits from the patch itself is the ability to pick between pg_lz and
> zlib. Which is not great, and so people tend to pick other patches.
>
> Now, I understand there's a lot of potential benefits further down the
> line, like column-level compression (which I think is the main goal
> here). But that's not included in the patch, so the gains are somewhat
> far in the future.
>

Not discussing whether any particular committer should pick this up but I
want to discuss an important use case we have at Adjust for this sort of
patch.

The PostgreSQL compression strategy is something we find inadequate for at
least one of our large deployments (a large debug log spanning 10PB+).  Our
current solution is to set storage so that it does not compress and then
run on ZFS to get compression speedups on spinning disks.

But running PostgreSQL on ZFS has some annoying costs because we have
copy-on-write on copy-on-write, and when you add file fragmentation... I
would really like to be able to get away from having to do ZFS as an
underlying filesystem.  While we have good write throughput, read
throughput is not as good as I would like.

An approach that would give us better row-level compression  would allow us
to ditch the COW filesystem under PostgreSQL approach.

So I think the benefits are actually quite high particularly for those
dealing with volume/variety problems where things like JSONB might be a
go-to solution.  Similarly I could totally see having systems which handle
large amounts of specialized text having extensions for dealing with these.

>
> But hey, I think there are committers working for postgrespro, who might
> have the motivation to get this over the line. Of course, assuming that
> there are no serious objections to having this functionality or how it's
> implemented ... But I don't think that was the case.
>

While I am not currently able to speak for questions of how it is
implemented, I can say with very little doubt that we would almost
certainly use this functionality if it were there and I could see plenty of
other cases where this would be a very appropriate direction for some other
projects as well.


> regards
>
> --
> Tomas Vondra  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
>

-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: Data-only pg_rewind, take 2

2019-03-18 Thread Chris Travers
On Mon, Mar 18, 2019 at 4:09 AM Michael Paquier  wrote:

> On Sun, Mar 17, 2019 at 09:00:57PM +0800, Chris Travers wrote:
> > I also added test cases and some docs.  I don't know if the docs are
> > sufficient.  Feedback is appreciated.
>
> To be honest, I don't think that this approach is a good idea per the
> same reasons as mentioned the last time, as this can cause pg_rewind
> to break if any newly-added folder in the data directory has
> non-replaceable data which is needed at the beginning of recovery and
> cannot be automatically rebuilt.  So that's one extra maintenance
> burden to worry about.
>

Actually I think this is safe.  Let me go through the cases not handled in
the current behavior at all:

1.  For rpms we distribute, we clobber db logs, which means we overwrite
application logs on the failed system with copes of logs from a replica.
This means that after you rewind, you lose the ability to figure out what
went wrong.  This is an exquisitely bad idea unless you like data loss, and
since this location is configurable you can't just say "well we put our
logs here so we are excluding them."  Making this configured per rewind run
strikes me as error-prone and something that will may lead to hidden
interference with postmortems in the future, and postmortems are vitally
important in terms of running database clusters with any sort of
reliability guarantees.

2.  With the PostgreSQL.conf.auto now having recovery.conf info, you have
some very significant failure cases with regard to replication and
accidentally clobbering these.

On to the corner cases with --data-only enabled and the implications as I
see them since this preserves files on the old master but does not copy
them from the replica:

1.  If the changes are not wal logged (let's say CSVs created using a file
foreign data wrapper), then deleting the files on rewind is where you can
lose data, and --data-only avoids this, so here you *avoid* data loss where
you put state files on the systems and do not rewind them because they were
not wal-logged.  However the files still exist on the old master and are
not deleted, so the data can easily be restored at that point.  Now, we can
say, probably, that putting data files in $PGDATA that are not wal-logged
is a bad idea.  But even if you put them somewhere else, pg_rewind isn't
going to magically move them over to the replica for you.

2.  If the changes *are* wal-logged, then you have a problem with
--data-only which is not present without it, namely that files can get out
of sync with their wal-logged updates.  So in this case, --data-dir is
*not* safe.

So here I think we have to issue a choice.  For now I don't feel
comfortable changing the default behavior, but the default behavior could
cause data loss in certain cases (including the ones I think you are
concerned about).  Maybe it would be better if I document the above points?


>
> Here is the reference of the last thread about the same topic:
>
> https://www.postgresql.org/message-id/can-rpxd8y7hmojzd93hoqv6n8kpeo5cmw9gym+8jirtpifn...@mail.gmail.com
> --
> Michael
> -BEGIN PGP SIGNATURE-
>
> iQIzBAABCgAdFiEEG72nH6vTowiyblFKnvQgOdbyQH0FAlyPGgYACgkQnvQgOdby
> QH0ekRAAiXcZRcDZwwwdbdlIpkniE/SuG5gaS7etUcAW88m8Vts5r4QoAEwUwGhg
> EZzuOb77OKvti7lmOZkBgC0VB1PmFku+mIdqJtzvdcSDdlOkABcLaw4JRrm//2/7
> jAi5Jw4um1EAz38dZXcWYwORavyo/4tR2S1PCyBA35F704w2NILAEDiq233P/ALf
> M3cOjgwiFIPf0v9PJIfYsl56sIwqW4rofPH63V6teaz5W8Qf2zHSsG5CeNqnEix0
> QZwwlzuhtAUYINab3oN3qMtF2q9vzJWCoSprzxx1qYrzPHEX8EMot0+L7YPdaAp0
> xyiUKSzy1rXtpoW0rsJ7w5bdrh1gS7HzprCEtqRZGe6NlVDcNjXfJIG9sT6hMWYS
> GTNbVH5VpKziw3byT8JpyqR38+iFqeXoLd1PEVadYjP62qOWbK8P2wokQwM+7EcK
> Hpr8jrvgV5x8IEnhR4bPyTqjORCJMBGTXCNgT99cPYpuVSasr/0IsBC/RtmQfRB9
> xhK0/qp5koQbX+mbLK11XsaFS9JAL2DNmSQg8TqICtV3bb0UTThs331XgjEjlOpm
> 1RjM6Tzwqq2is04mkkT+DtRAOclQuL8wWJWU5rr4fMKHCeFxtvUfwTyKlo2u+mI0
> x7YZhd4AFCM14ga2Ko/qiGqeOWR5Y0RvYANmnmjG5bxQGi+Dtek=
> =LNZB
> -END PGP SIGNATURE-
>


-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Data-only pg_rewind, take 2

2019-03-17 Thread Chris Travers
Hi;

Attached is my second attempt at the pg_rewind change which allows one to
include only a minimal set.  To my understanding, all past feedback has
been addressed.

The current patch does not change default behavior at present.  It does add
a --data-only flag which allows pg_rewind to only rewind minimal files to
work.  I believe this would generally be a good practice though maybe there
is some disagreement on that.

I have not run pg_indent because of the large number of other changes but
if that is desired at some point I can do that.

I also added test cases and some docs.  I don't know if the docs are
sufficient.  Feedback is appreciated.  This is of course submitted for v13.

-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


rewind_data_only_mode.patch
Description: Binary data


Re: Re: Re: [HACKERS] Custom compression methods

2019-03-15 Thread Chris Travers
On Fri, Mar 15, 2019 at 6:07 PM David Steele  wrote:

> On 3/7/19 11:50 AM, Alexander Korotkov wrote:
> > On Thu, Mar 7, 2019 at 10:43 AM David Steele  > <mailto:da...@pgmasters.net>> wrote:
> >
> > On 2/28/19 5:44 PM, Ildus Kurbangaliev wrote:
> >
> >  > there are another set of patches.
> >  > Only rebased to current master.
> >  >
> >  > Also I will change status on commitfest to 'Needs review'.
> >
> > This patch has seen periodic rebases but no code review that I can
> see
> > since last January 2018.
> >
> > As Andres noted in [1], I think that we need to decide if this is a
> > feature that we want rather than just continuing to push it from CF
> > to CF.
> >
> >
> > Yes.  I took a look at code of this patch.  I think it's in pretty good
> > shape.  But high level review/discussion is required.
>
> OK, but I think this patch can only be pushed one more time, maximum,
> before it should be rejected.
>

As a note, we believe at Adjust that this would be very helpful for some of
our use cases and some other general use cases.  I think as a feature,
custom compression methods are a good thing but we are not the only ones
with interests here and would be interested in pushing this forward if
possible or finding ways to contribute to better approaches in this
particular field.

>
> Regards,
> --
> -David
> da...@pgmasters.net
>
>

-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: Ltree syntax improvement

2019-03-07 Thread Chris Travers
On Tue, Feb 5, 2019 at 2:27 PM Nikolay Shaplov  wrote:

> В письме от вторник, 29 января 2019 г. 20:43:07 MSK пользователь Dmitry
> Belyavsky написал:
>
> > Please find attached the patch extending the sets of symbols allowed in
> > ltree labels. The patch introduces 2 variants of escaping symbols, via
> > backslashing separate symbols and via quoting the labels in whole for
> > ltree, lquery and ltxtquery datatypes.
>
> Hi!
>
> Let's me join the review process...
>
> I've just give a superficial look to the patch, read it through, and try
> here
> and there, so first I'll give feedback about exterior features of the
> patch.
>
> So, the first question: why do we need all this? The answer seems to be
> obvious, but it would be good say it explicitly. What problems would it
> solve?
> Do these problems really exists?
>

Yes, it does.  We maintain an extension (https://github.com/adjust/wltree)
which has a fixed separator (::) and allows any utf-8 character in the tree.

In our case we currently use our extended tree to store user-defined
hierarchies of labels, which might contain whitespace, Chinese, Japanese,
or Korean characters, etc.

I would love to be able to deprecate our work on this extension and
eventually use stock ltree.

>
> The second question is about coding style. As far as I can see you've
> passed
> your code through pg_indent, but it does not solve all problems.
>
> As far as I know, postgres commiters are quite strict about code width,
> the
> code should not be wider that 80 col, except places were string constants
> are
> introduced (There you can legally exceed 80 col). So I would advice to use
> vim
> with  tabstop=4  and colorcolumn=80 and make sure that new code text does
> not
> cross red vertical line.
>
> Comments. There is no fixed coding style for comments in postgres. Usually
> this
> means that it is better to follow coding in the code around. But ltree
> does
> not have much comments, so I would suggest to use the best style I've seen
> in
> the code
> /*
>  * [function name]
>  * < tab > [Short description, what does it do]
>  *
>  * [long explanation how it works, several subparagraph if needed
>  */
> (Seen this in src/include/utils/rel.h)
> or something like that.
> At least it would be good to have explicit separation between descriptions
> and
> explanation of how it works.
>
> And about comment
> /*
>  * Function calculating bytes to escape
>  * to_escape is an array of "special" 1-byte symbols
>  * Behvaiour:
>  * If there is no "special" symbols, return 0
>  * If there are any special symbol, we need initial and final quote, so
> return
> 2
>  * If there are any quotes, we need to escape all of them and also initial
> and
> final quote, so
>  * return 2 + number of quotes
>  */
>
> It has some formatting. But it should not.  As far as I understand this
> comment should be treated as single paragraph by reformatter, and
> refomatted.
> I do not understand why pg_indent have not done it. Documentation (https://
> www.postgresql.org/docs/11/source-format.html) claims that if you want to
> avoid reformatting, you should add "-" at the beginning and at
> the
> end of the comment. So it would be good either remove this formatting, or
> add
> "" if you are sure you want to keep it.
>
> Third part is about error messages.
> First I've seen errors like elog(ERROR, "internal error during splitting
> levels");. I've never seen error like that in postgres. May be if this
> error
> is in the part of the code, that should never be reached in real live, may
> be
> it would be good to change it to Assert? Or if this code can be normally
> reached, add some more explanation of what had happened...
>
> About other error messages.
>
> There are messages like
> errmsg("syntax error"),
> errdetail("Unexpected end of line.")));
>
> or
>
> errmsg("escaping syntax error")));
>
>
> In postgres there is error message style guide
> https://www.postgresql.org/docs/11/error-style-guide.html
>
> Here I would expect messages like that
>
> Primary:Error parsing ltree path
> Detail: Curly quote is opened and not closed
>
> Or something like that, I am not expert in English, but this way it would
> be
> better for sure. Key points are: Primary message should point to general
> area
> where error occurred (there can be a large SQL with a lot of things in it,
> so
> it is good to point that problem is with ltree staff). And is is better to
> word from the point of view of a user. What for you (and me) is unexpected
> e

Re: Proposal for Signal Detection Refactoring

2019-03-06 Thread Chris Travers
Here's a new patch.  No rush on it.  I am moving it to next commitfest
anyway because as code documentation I think this is a low priority late in
the release cycle.

The changes  mostly address Andres's feedback above.

-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


signal_readme.patch
Description: Binary data


Re: Prevent extension creation in temporary schemas

2019-03-06 Thread Chris Travers
On Wed, Mar 6, 2019 at 9:33 AM Chris Travers 
wrote:

>
>
>> Thoughts?
>>
>
To re-iterate, my experience with PostgreSQL is that people doing
particularly exotic work in PostgreSQL can expect to hit equally exotic
bugs.  I have a list that I will not bore people with here.

I think there is a general consensus here that creating extensions in temp
schemas is something we would like to support.  So I think we should fix
these bugs before we make it easy to do.  And this patch addresses one of
those.

--
>> Michael
>>
>
>
> --
> Best Regards,
> Chris Travers
> Head of Database
>
> Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
> Saarbrücker Straße 37a, 10405 Berlin
>
>

-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: Prevent extension creation in temporary schemas

2019-03-06 Thread Chris Travers
On Wed, Mar 6, 2019 at 3:19 AM Michael Paquier  wrote:

> On Tue, Mar 05, 2019 at 12:47:54PM +0000, Chris Travers wrote:
> > I tried installing a test extension into a temp schema.  I found
> > this was remarkably difficult to do because pg_temp did not work (I
> > had to create a temporary table and then locate the actual table it
> > was created in).  While that might also be a bug it is not in the
> > scope of this patch so mostly noting in terms of future work.
>
> pgcrypto works in this case.
>

So the issue here is in finding the pg temp schema to install into.  The
extension is less of an issue.

The point of my note above is that there are other sharp corners that have
to be rounded off in order to make this work really well.

>
> > After creating the extension I did as follows:
> > \dx in the current session shows the extension
> > \dx in a stock psql shows the extension in a separate session
> > \dx with a patched psql in a separate session does not show the
> > extension.
> >
> > In terms of the scope of this patch, I think this correctly and
> > fully solves the problem at hand.
>
> I was just looking at this patch this morning with fresh eyes, and I
> think that I have found one argument to *not* apply it.  Imagine the
> following in one session:
> =# create extension pgcrypto with schema pg_temp_3;
> CREATE EXTENSION
> =# \dx
>   List of installed extensions
>Name   | Version |   Schema   | Description
> --+-++--
>  pgcrypto | 1.3 | pg_temp_3  | cryptographic functions
>  plpgsql  | 1.0 | pg_catalog | PL/pgSQL procedural language
> (2 rows)
>
> That's all good, we see that the session which created this extension
> has it listed.  Now let's use in parallel a second session:
> =# create extension pgcrypto with schema pg_temp_4;
> ERROR:  42710: extension "pgcrypto" already exists
> LOCATION:  CreateExtension, extension.c:1664
> =# \dx
>   List of installed extensions
>Name   | Version |   Schema   | Description
> --+-++--
>  plpgsql  | 1.0 | pg_catalog | PL/pgSQL procedural language
> (1 row)
>
> This is actually also good, because the extension of the temporary
> schema of the first session does not show up.  Now I think that this
> can bring some confusion to the user actually, because the extension
> becomes not listed via \dx, but trying to create it with a different
> schema fails.
>

Ok so at present I see three distinct issues here, where maybe three
different patches over time might be needed.

The issues are:

1.  create extension pgcrypto with schema pg_temp; fails because there is
no schema actually named pg_temp.
2.  If you work around this, the \dx shows temporary extensions in other
sessions.  This is probably the most minor issue of the three.
3.  You cannot create the same extension in two different schemas.

My expectation is that this may be a situation where other sharp corners
are discovered over time.  My experience is that where things are difficult
to do in PostgreSQL and hence not common, these sharp corners exist
(domains vs constraints in table-based composite types for example,
multiple inheritance being another).

It is much easier to review patches if they make small, well defined
changes to the code.  I don't really have an opinion on whether this should
be applied as is, or moved to next commitfest in the hope we can fix issue
#3 there too.  But I would recommend not fixing the pg_temp naming (#1
above) until at least the other two are fixed.  There is no sense in making
this easy yet.  But I would prefer to review or write patches that address
these issues one at a time rather than try to get them all reviewed and
included together.

But I don't think there is likely to be a lot of user confusion here.  It
is hard enough to install extensions in temporary schemas that those who do
can be expected to know more that \dx commands.

>
> Thoughts?
> --
> Michael
>


-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: Prevent extension creation in temporary schemas

2019-03-05 Thread Chris Travers
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:not tested

I ran make checkworld and everything passed.

I tried installing a test extension into a temp schema.  I found this was 
remarkably difficult to do because pg_temp did not work (I had to create a 
temporary table and then locate the actual table it was created in).  While 
that might also be a bug it is not in the scope of this patch so mostly noting 
in terms of future work.

After creating the extension I did as follows:
\dx in the current session shows the extension
\dx in a stock psql shows the extension in a separate session
\dx with a patched psql in a separate session does not show the extension.

In terms of the scope of this patch, I think this correctly and fully solves 
the problem at hand.

The new status of this patch is: Ready for Committer


Re: Prevent extension creation in temporary schemas

2019-02-18 Thread Chris Travers
On Mon, Feb 18, 2019 at 6:40 AM Kuroda, Hayato 
wrote:

> Dear Michael,
>
> I seem this patch is enough, but could you explain the reason
> you drop initial proposal more detail?
> I'm not sure why extensions contained by temporary schemas are acceptable.
>

Here's my objection.

Everything a relocatable extension can create can be created normally in a
temporary schema currently.  This includes types, functions, etc.

So I can create a type in a temporary schema and then create a table in a
public schema using that type as a column.  This behaves oddly (when I log
out of my session the column gets implicitly dropped) but it works
consistently.  Adding special cases to extensions strikes me as adding more
funny corners to the behavior of the db in this regard.

Now there are times I could imagine using temporary schemas with
extensions.  This could include testing multiple versions of an extension
so that multiple concurrent test runs don't see each other's versions.
This could be done with normal schemas but the guarantees are not as strong
regarding cleanup.

>
> > Anything depending on a temporary object will be dropped per
> > dependency links once the session is over.
>
> Extensions locate at pg_temp_* schemas are temporary objects IMO.
> How do you think? Would you implement this functionality in future?
>

That's the way things are now as far as I understand it, or do I
misunderstand your question?

>
> Hayato Kuroda
> Fujitsu LIMITED
>
>

-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: ALTER TABLE on system catalogs

2019-02-14 Thread Chris Travers
I have a couple of thoughts here.

On Fri, Feb 8, 2019 at 4:35 AM Kyotaro HORIGUCHI <
horiguchi.kyot...@lab.ntt.co.jp> wrote:

> At Fri, 08 Feb 2019 12:03:31 +0900 (Tokyo Standard Time), Kyotaro
> HORIGUCHI  wrote in <
> 20190208.120331.167280496.horiguchi.kyot...@lab.ntt.co.jp>
> > By the way, I'm confused to see that attributes that don't want
> > to go external are marked as 'x' in system catalogs. Currently
> > (putting aside its necessity) the following operation ends with
> > successful attaching a new TOAST relation, which we really don't
> > want.
> >
> > ALTER TABLE pg_attribute ALTER COLUMN attrelid SET STORAGE plain;
> >
> > Might be silly, we may need another storage class, say,
> > Compression, which means try compress but don't go external.
> >
> > The attached patch does that.
> >
> > - All varlen fields of pg_class and pg_attribute are marked as
> >   'c'.  (Catalog.pm, genbki.pl, genbki.h, pg_attribute/class.h)
> >
> > - Try compress but don't try external for 'c' storage.
> >   (tuptoaster.c, toasting.c)
> >
> >
> > We could remove toast relation when no toastable attribute
> > remains after ATLER TABLE ALTER COLUMN SET STOAGE, but it doesn't
> > seem that simple. So the storage class is for internal use only.
>

I guess there is a serious question why this is for internal use only.
Are there any particular cases where this would be problematic for those
who know what they are doing to set?  If not, maybe it should be included
in the docs?

>
> I found that "ALTER TABLE.. SET STORAGE plain" doesn't remove
> toast relation, so it's not so bad even if compress does the
> same?
>
> The attached 0001 is fixed from the previous version. 0002 is the
> syntax.
>

I agree that we need to support setting options on tables in system
catalogs.  We have some cases where we want to disable autovacuum on some
table spaces, but set it aggressively on a couple system catalogs.
Currently this is not really doable in any sane way.

I also think that if the current catalogs violate expectations regarding
precondition checks this needs to be corrected rather than special-cased
and this seems to be the best way forward.

>
> regards.
>
> --
> Kyotaro Horiguchi
> NTT Open Source Software Center
>


-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: Challenges preventing us moving to 64 bit transaction id (XID)?

2019-02-13 Thread Chris Travers
On Thu, Mar 1, 2018 at 11:48 PM Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> Hi!
>
> On Fri, Mar 2, 2018 at 1:41 AM, Andres Freund  wrote:
>
>> On 2018-01-11 01:02:52 +0300, Alexander Korotkov wrote:
>> > As I get from cputube, patchset doesn't compiles again.  Please find
>> > revised version attached.
>>
>> It'd be good if you could maintain the patches as commits with some
>> description of why you're doing these changes.  It's a bit hard to
>> figure that out sometimes and having to dig through a thread with quite
>> some discussion certainly makes it less likely to be looked at.
>>
>
> Thank you for pointing.  In future I'll maintain patches with their
> description.
> BTW, during FOSDEM developer meeting we decided that I should extract
> 64-bit in-memory representation of xids and resubmit it, while 64-bit
> on-disk reprensentation should become a pluggable table access method.
> I didn't manage to do this before current commitfest.  Also, the last
> commitfest
> is already too late for such big changes.  So, I'm marking this RWF.
>

I am wondering about this point.  If there are different access methods,
what does that mean for transactions crossing the 32-bit mark? Is it
possible that supporting 32-bit representations and 64-bit representations
together could cause visibility problems and bugs?  Or is that with the
idea that pages would be upgraded on write?


>
> --
> Alexander Korotkov
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>


-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: Challenges preventing us moving to 64 bit transaction id (XID)?

2019-02-13 Thread Chris Travers
On Fri, Mar 2, 2018 at 12:07 AM Andres Freund  wrote:

> On 2018-03-02 01:56:00 +0300, Alexander Korotkov wrote:
> > On Fri, Mar 2, 2018 at 1:51 AM, Andres Freund 
> wrote:
> >
> > > On 2018-03-02 01:48:03 +0300, Alexander Korotkov wrote:
> > > > Also, the last commitfest is already too late for such big changes.
> > > > So, I'm marking this RWF.
> > >
> > > Agreed.  Perhaps extract the 64bit GUC patch and track that separately?
> > > Seems like something we should just do...
> > >
> >
> > Sounds reasonable.  But I didn't notice if there are other users for
> 64bit
> > GUCs besides 64bit xids?
>
> I think there were a couple past occasions where we could've used that,
> don't quite recall the details. We're at least not that far away from
> the point where various size limits are actually limited by int32
> range. And timeouts of ~25 days are long but not entirely unreasonable.
>

As a note here, I have worked on projects where there could be 2-week-long
idle-in-transaction states (no joke, we tuned things to only use virtual
xids for most of that time), and having an ability to set
idle-in-transaction timeouts to figures of greater than a month are things
I could imagine doing.  I would certainly favor the idea of 64-big GUC
variables as a general rule.

>
> Greetings,
>
> Andres Freund
>
>

-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: Prevent extension creation in temporary schemas

2019-02-13 Thread Chris Travers
On Sat, Jan 12, 2019 at 12:48 AM Michael Paquier 
wrote:

> On Sat, Jan 12, 2019 at 08:34:37AM +0900, Michael Paquier wrote:
> > Then the extension is showing up as beginning to be present for other
> > users.  I am mainly wondering if this case has actually been thought
> > about in the past or discussed, and what to do about that and if we
> > need to do something.
>
> The point here is about the visibility in \dx.
>

If the point is visibility in \dx it seems to me we want to fix the \dx
query.

This is actually a very interesting set of problems and behavior is not
intuitive here in PostgreSQL.  I wonder how much more inconsistency we want
to add.

For example: suppose I create a type in pg_temp and create a table in
public with a column using that type.

What is the expected visibility in other sessions?

What happens to the table when I log out?

I went ahead and tested that case and I found the behavior to be, well,
unintuitive.  The temporary type is visible to other  sessions and the
column is implicitly dropped when the type falls out of session scope.
Whether or not we want to prevent that, I think that having special casing
here for extensions makes this behavior even more inconsistent.  I guess I
would vote against accepting this patch as it is.

> --
> Michael
>


-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: Commit Fest 2019-01 is now closed

2019-02-08 Thread Chris Travers
On Fri, Feb 8, 2019 at 9:40 AM Tsunakawa, Takayuki <
tsunakawa.ta...@jp.fujitsu.com> wrote:

> From: Bruce Momjian [mailto:br...@momjian.us]
> > I am thinking we should see which items we really want for PG 12 _now_
> > and allocate resources/help to get them done, rather than being
> > surprised they didn't make it.  I am glad we are in good shape with
> > CTEs, since that has been a long-requested feature.
>
> I want the partitioning performance to be comparable to a certain
> commercial DBMS, and Horiguchi-san's "protect syscache ..." to limit the
> upper size of syscache/relcache.  Otherwise, our organization may be put in
> a difficult situation...  Anyway, I recognize my colleagues and I also have
> to review others' patches.
>
> I'm sorry if I repeat what someone proposed in the past, but it would be
> nice if the CF app could show the number of modified lines (added +
> deleted) for each patch, e.g., the last line of diffstat command output.
> That would help young newbies to choose patches.
>

An important prerequisite here would be to list all patches on the most
recent email.  Right now it just lists one so you have to follow the link
to the hackers list entry and look at each patch one at a time.

So if we could grab all attachments ending in .diff or .patch and list line
counts, that would be a welcome improvement.


>
> Regards
> Takayuki Tsunakawa
>
>
>
>

-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: Prevent extension creation in temporary schemas

2019-02-04 Thread Chris Travers
This could probably use a quick note in the docs.

Re: Proposal for Signal Detection Refactoring

2019-01-23 Thread Chris Travers
attached is a new signal handing patch.  Path is corrected and moved.  The
documentation is sightly streamlined in some places and expanded in others.

Feedback requested.
-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


signal_readme_patch.diff
Description: Binary data


Re: Proposal for Signal Detection Refactoring

2019-01-17 Thread Chris Travers
On Thu, Jan 17, 2019 at 7:08 PM Andres Freund  wrote:

> Hi,
>
> On 2018-10-09 16:04:35 +0200, Peter Eisentraut wrote:
> > More generally, I'd like this material to be code comments.  It's the
> > kind of stuff that gets outdated before long if it's kept separate.
>
> I'm not sure I buy this here - we don't have (but perhaps should?) a
> convenient location for an overview comment around this. There's no
> "signal_handling.c" where it'd clearly belong - given the lack of a
> clear point to look to, I don't think a README.SIGNAL_HANDLING would get
> out-of-date more quickly than code comments in mildly related place (say
> postgres.c or miscinit.c) would get out of date at a different pace.
>

The other point is that "This is the way to do it" is a little easier when
in a README rather than in code comments sine those might be scattered in a
bunch of different places.

>
> Greetings,
>
> Andres Freund
>


-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: Proposal for Signal Detection Refactoring

2019-01-17 Thread Chris Travers
Latest patch, simpler but answers the key questions as code comments

On Mon, Dec 3, 2018 at 6:56 AM Chris Travers 
wrote:

> On Thu, Nov 29, 2018 at 8:46 PM Dmitry Dolgov <9erthali...@gmail.com>
> wrote:
>
>> > On Wed, Oct 10, 2018 at 7:10 PM Chris Travers 
>> wrote:
>> >
>> >> More generally, I'd like this material to be code comments.  It's the
>> >> kind of stuff that gets outdated before long if it's kept separate.
>> >
>> > The problem is that code comments are not going to be good places to
>> document "how do I check for pending actions?"  That could be moved to the
>> main SGML I guess.
>>
>> I aggree with Peter here, for me it also feels more natural to have this
>> information as code commentaries - at least if I would search for it that
>> would
>> be my first thought. As for "how do I..." part, I think there are alreasy
>> similar commentaries in the code, which makes sense - this kind of
>> questions
>> usually appear when you're reading/writing some code.
>>
>> It doesn't look like there is much left to do in this discussion, but for
>> now
>> I'll move it to the next CF.
>>
>
> Ok will move this into code comments if there are no objections.
>
>
> --
> Best Regards,
> Chris Travers
> Head of Database
>
> Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
> Saarbrücker Straße 37a, 10405 Berlin
>
>

-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


signal-comments.patch
Description: Binary data


Re: Proposal for Signal Detection Refactoring

2018-12-02 Thread Chris Travers
On Thu, Nov 29, 2018 at 8:46 PM Dmitry Dolgov <9erthali...@gmail.com> wrote:

> > On Wed, Oct 10, 2018 at 7:10 PM Chris Travers 
> wrote:
> >
> >> More generally, I'd like this material to be code comments.  It's the
> >> kind of stuff that gets outdated before long if it's kept separate.
> >
> > The problem is that code comments are not going to be good places to
> document "how do I check for pending actions?"  That could be moved to the
> main SGML I guess.
>
> I aggree with Peter here, for me it also feels more natural to have this
> information as code commentaries - at least if I would search for it that
> would
> be my first thought. As for "how do I..." part, I think there are alreasy
> similar commentaries in the code, which makes sense - this kind of
> questions
> usually appear when you're reading/writing some code.
>
> It doesn't look like there is much left to do in this discussion, but for
> now
> I'll move it to the next CF.
>

Ok will move this into code comments if there are no objections.


-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: Proposal for Signal Detection Refactoring

2018-10-10 Thread Chris Travers
On Tue, Oct 9, 2018 at 4:04 PM Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 01/10/2018 14:00, Chris Travers wrote:
> >
> >
> > On Wed, Sep 26, 2018 at 9:54 AM Chris Travers  > <mailto:chris.trav...@adjust.com>> wrote:
> >
> >
> >
> > On Tue, Sep 25, 2018 at 3:23 PM Tom Lane  > <mailto:t...@sss.pgh.pa.us>> wrote:
> >
> > Chris Travers  > <mailto:chris.trav...@adjust.com>> writes:
> > > However,  what I think one could do is use a struct of volatile
> > > sig_atomic_t members and macros for checking/setting.  Simply
> > writing a
> > > value is safe in C89 and higher.
> >
> > Yeah, we could group those flags in a struct, but what's the
> point?

>
> >
> > This was one of two things I noticed in my previous patch on
> > interrupts and loops where I wasn't sure what the best practice in
> > our code is.
> >
> > If we don't want to make this change, then would there be any
> > objection to me writing up a README describing the flags, and best
> > practices in terms of checking them in our code based on the current
> > places we use them?  If the current approach will be unlikely to
> > change in the future, then at least we can document that the way I
> > went about things is consistent with current best practices so next
> > time someone doesn't really wonder.
> >
> >
> > Attaching a first draft of a readme.  Feedback welcome.  I noticed
> > further that we used to document signals and what they did with carious
> > processes but that this was removed after 7.0, probably due to
> > complexity reasons.
>
> diff --git a/src/backend/utils/init/README b/src/backend/utils/init/README
> new file mode 100644
> index 00..0472687f53
> --- /dev/null
> +++ b/src/backend/utils/init/README
> @@ -0,0 +1,96 @@
> +src/backend/utils/misc/README
>
> Not only are the directory names mismatched here, but I wonder whether
> this file would be long in either of these directories.
>

I am open to suggestions here.  The file was put where the signal stuff
was.  Maybe moving it up and calling it signals.README?

>
> +Implementation Notes on Globals and Signal/Event Handling
> +=
> +
> +The approch to signal handling in PostgreSQL is designed to strictly
> conform
> +with the C89 standard and designed to run with as few platform
> assumptions as
> +possible.
>
> We use C99 now, so why would be design this to be conforming to C89?
>

Can or worms alert:  C89 and C99 are functionally the same here.  C99
changed the spec in part because every known C89 implementation was
compliant in this area with the C99 spec.

I am fine with bumping that up to C99.

>
>
> More generally, I'd like this material to be code comments.  It's the
> kind of stuff that gets outdated before long if it's kept separate.
>

The problem is that code comments are not going to be good places to
document "how do I check for pending actions?"  That could be moved to the
main SGML I guess.

>
> --
> Peter Eisentraut  http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Possible important data point on stats collection, wondering about possible improvement

2018-10-04 Thread Chris Travers
Hi;

System is PostgreSQL 10.5, all partitioning done the old way (via
inheritance).

Last month we had some performance issues caused by statistics being out of
date and the planner choosing the wrong index for a large number of
queries.  The proximal fix was to increase the stats target from 1000 to
1 and analyze which prevents the problem from continuing to manifest.
Looking back at the graphs though I notice a number of things which make me
think that maybe there might be ways of improving the situation without
increasing stats targets.

I wanted to bring up the question here and see if there were opportunities
to work together on improving the situation.

What I measured was the difference between the maximum value in the
statistics histogram and the maximum value in the table.  The relevant
field is a timestamp field, so calculating lag is straight-forward and
gives us a percentage of the table that is outside stats.

What I noticed was that the major analytics tables seem to fall into two
groups:
1.  One group has actual clear sampling issues as evidenced by the fact
that the difference in values swung wildly around.  One of these tables had
52 million rows spread between two partitions (1M and 51M respectively).
On this group I understand the need to set stats targets up.

2.  A second group saw a more mild increase in lag between max value
recorded in stats and max value in the db.  However what struck me about
this was that the lag seemed fairly linear.  In other words, the
fluctuations were within a relatively narrow range and the lag seemed to
grow linearly with time.  These tables were actually larger (one typical
one was 60M rows split between a partition of 51M rows and one of 9M rows).

The workload for the database in question is heavily update driven so I
would expect fewer sampling bias problems than might happen for insert-only
workloads.

The second case puzzles me.  I have been looking carefully into how the
stats collector works and I cannot find anything that could account for a
near-linear increase in statics missing recent data.  What starts out in a
60M row table with default stats targets (1000) ends up going about 10% off
over time.

What I am wondering is whether it would make any sense whatsoever to expand
the stats to include min and max values found in a scan, or whether it
would make more sense to try to help the planner extrapolate from existing
stats in a more efficient way.

-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: [HACKERS] Transactions involving multiple postgres foreign servers, take 2

2018-10-03 Thread Chris Travers
On Wed, Oct 3, 2018 at 9:41 AM Chris Travers 
wrote:

> The following review has been posted through the commitfest application:
> make installcheck-world:  tested, failed
> Implements feature:   not tested
> Spec compliant:   not tested
> Documentation:tested, failed
>
> I am hoping I am not out of order in writing this before the commitfest
> starts.  The patch is big and long and so wanted to start on this while
> traffic is slow.
>
> I find this patch quite welcome and very close to a minimum viable
> version.  The few significant limitations can be resolved later.  One thing
> I may have missed in the documentation is a discussion of the limits of the
> current approach.  I think this would be important to document because the
> caveats of the current approach are significant, but the people who need it
> will have the knowledge to work with issues if they come up.
>
> The major caveat I see in our past discussions and (if I read the patch
> correctly) is that the resolver goes through global transactions
> sequentially and does not move on to the next until the previous one is
> resolved.  This means that if I have a global transaction on server A, with
> foreign servers B and C, and I have another one on server A with foreign
> servers C and D, if server B goes down at the wrong moment, the background
> worker does not look like it will detect the failure and move on to try to
> resolve the second, so server D will have a badly set vacuum horizon until
> this is resolved.  Also if I read the patch correctly, it looks like one
> can invoke SQL commands to remove the bad transaction to allow processing
> to continue and manual resolution (this is good and necessary because in
> this area there is no ability to have perfect recoverability without
> occasional administrative action).  I would really like to see more
> documentation of failure cases and appropriate administrative action at
> present.  Otherwise this is I think a minimum viable addition and I think
> we want it.
>
> It is possible i missed that in the documentation.  If so, my objection
> stands aside.  If it is welcome I am happy to take a first crack at such
> docs.
>

After further testing I am pretty sure I misread the patch.  It looks like
one can have multiple resolvers which can, in fact, work through a queue
together solving this problem.  So the objection above is not valid and I
withdraw that objection.  I will re-review the docs in light of the
experience.


>
> To my mind thats the only blocker in the code (but see below).  I can say
> without a doubt that I would expect we would use this feature once
> available.
>
> --
>
> Testing however failed.
>
> make installcheck-world fails with errors like the following:
>
>  -- Modify foreign server and raise an error
>   BEGIN;
>   INSERT INTO ft7_twophase VALUES(8);
> + ERROR:  prepread foreign transactions are disabled
> + HINT:  Set max_prepared_foreign_transactions to a nonzero value.
>   INSERT INTO ft8_twophase VALUES(NULL); -- violation
> ! ERROR:  current transaction is aborted, commands ignored until end of
> transaction block
>   ROLLBACK;
>   SELECT * FROM ft7_twophase;
> ! ERROR:  prepread foreign transactions are disabled
> ! HINT:  Set max_prepared_foreign_transactions to a nonzero value.
>   SELECT * FROM ft8_twophase;
> ! ERROR:  prepread foreign transactions are disabled
> ! HINT:  Set max_prepared_foreign_transactions to a nonzero value.
>   -- Rollback foreign transaction that involves both 2PC-capable
>   -- and 2PC-non-capable foreign servers.
>   BEGIN;
>   INSERT INTO ft8_twophase VALUES(7);
> + ERROR:  prepread foreign transactions are disabled
> + HINT:  Set max_prepared_foreign_transactions to a nonzero value.
>   INSERT INTO ft9_not_twophase VALUES(7);
> + ERROR:  current transaction is aborted, commands ignored until end of
> transaction block
>   ROLLBACK;
>   SELECT * FROM ft8_twophase;
> ! ERROR:  prepread foreign transactions are disabled
> ! HINT:  Set max_prepared_foreign_transactions to a nonzero value.
>
> make installcheck in the contrib directory shows the same, so that's the
> easiest way of reproducing, at least on a new installation.  I think the
> test cases will have to handle that sort of setup.
>
> make check in the contrib directory passes.
>
> For reasons of test failures, I am setting this back to waiting on author.
>
> --
> I had a few other thoughts that I figure are worth sharing with the
> community on this patch with the idea that once it is in place, this may
> open up more options for collaboration in the area of federated and
> distributed storage generally.  I could imagine other foreign data wrappe

Re: [HACKERS] Transactions involving multiple postgres foreign servers, take 2

2018-10-03 Thread Chris Travers
On Wed, Oct 3, 2018 at 9:56 AM Chris Travers 
wrote:

>
>
> On Wed, Oct 3, 2018 at 9:41 AM Chris Travers 
> wrote:
>
>>
>> (errmsg("preparing foreign transactions
> (max_prepared_foreign_transactions > 0) requires maX_foreign_xact_resolvers
> > 0")));
>

Two more critical notes here which I think are small blockers.

The error message above references a config variable that does not exist.

The correct name of the config parameter is
max_foreign_transaction_resolvers

 Setting that along with the following to 10 caused the tests to pass, but
again it fails on default configs:

max_prepared_foreign_transactions, max_prepared_transactions

>
>
>
> --
> Best Regards,
> Chris Travers
> Head of Database
>
> Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
> Saarbrücker Straße 37a, 10405 Berlin
>
>

-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: [HACKERS] Transactions involving multiple postgres foreign servers, take 2

2018-10-03 Thread Chris Travers
On Wed, Oct 3, 2018 at 9:41 AM Chris Travers 
wrote:

> The following review has been posted through the commitfest application:
> make installcheck-world:  tested, failed
> Implements feature:   not tested
> Spec compliant:   not tested
> Documentation:tested, failed
>

Also one really minor point:  I think this is a typo (maX vs max)?

(errmsg("preparing foreign transactions (max_prepared_foreign_transactions
> 0) requires maX_foreign_xact_resolvers > 0")));




-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: [HACKERS] Transactions involving multiple postgres foreign servers, take 2

2018-10-03 Thread Chris Travers
The following review has been posted through the commitfest application:
make installcheck-world:  tested, failed
Implements feature:   not tested
Spec compliant:   not tested
Documentation:tested, failed

I am hoping I am not out of order in writing this before the commitfest starts. 
 The patch is big and long and so wanted to start on this while traffic is slow.

I find this patch quite welcome and very close to a minimum viable version.  
The few significant limitations can be resolved later.  One thing I may have 
missed in the documentation is a discussion of the limits of the current 
approach.  I think this would be important to document because the caveats of 
the current approach are significant, but the people who need it will have the 
knowledge to work with issues if they come up.

The major caveat I see in our past discussions and (if I read the patch 
correctly) is that the resolver goes through global transactions sequentially 
and does not move on to the next until the previous one is resolved.  This 
means that if I have a global transaction on server A, with foreign servers B 
and C, and I have another one on server A with foreign servers C and D, if 
server B goes down at the wrong moment, the background worker does not look 
like it will detect the failure and move on to try to resolve the second, so 
server D will have a badly set vacuum horizon until this is resolved.  Also if 
I read the patch correctly, it looks like one can invoke SQL commands to remove 
the bad transaction to allow processing to continue and manual resolution (this 
is good and necessary because in this area there is no ability to have perfect 
recoverability without occasional administrative action).  I would really like 
to see more documentation of failure cases and appropriate administrative 
action at present.  Otherwise this is I think a minimum viable addition and I 
think we want it.

It is possible i missed that in the documentation.  If so, my objection stands 
aside.  If it is welcome I am happy to take a first crack at such docs.

To my mind thats the only blocker in the code (but see below).  I can say 
without a doubt that I would expect we would use this feature once available.

--

Testing however failed.

make installcheck-world fails with errors like the following:

 -- Modify foreign server and raise an error
  BEGIN;
  INSERT INTO ft7_twophase VALUES(8);
+ ERROR:  prepread foreign transactions are disabled
+ HINT:  Set max_prepared_foreign_transactions to a nonzero value.
  INSERT INTO ft8_twophase VALUES(NULL); -- violation
! ERROR:  current transaction is aborted, commands ignored until end of 
transaction block
  ROLLBACK;
  SELECT * FROM ft7_twophase;
! ERROR:  prepread foreign transactions are disabled
! HINT:  Set max_prepared_foreign_transactions to a nonzero value.
  SELECT * FROM ft8_twophase;
! ERROR:  prepread foreign transactions are disabled
! HINT:  Set max_prepared_foreign_transactions to a nonzero value.
  -- Rollback foreign transaction that involves both 2PC-capable
  -- and 2PC-non-capable foreign servers.
  BEGIN;
  INSERT INTO ft8_twophase VALUES(7);
+ ERROR:  prepread foreign transactions are disabled
+ HINT:  Set max_prepared_foreign_transactions to a nonzero value.
  INSERT INTO ft9_not_twophase VALUES(7);
+ ERROR:  current transaction is aborted, commands ignored until end of 
transaction block
  ROLLBACK;
  SELECT * FROM ft8_twophase;
! ERROR:  prepread foreign transactions are disabled
! HINT:  Set max_prepared_foreign_transactions to a nonzero value.

make installcheck in the contrib directory shows the same, so that's the 
easiest way of reproducing, at least on a new installation.  I think the test 
cases will have to handle that sort of setup.

make check in the contrib directory passes.

For reasons of test failures, I am setting this back to waiting on author.

--
I had a few other thoughts that I figure are worth sharing with the community 
on this patch with the idea that once it is in place, this may open up more 
options for collaboration in the area of federated and distributed storage 
generally.  I could imagine other foreign data wrappers using this API, and 
folks might want to refactor out the atomic handling part so that extensions 
that do not use the foreign data wrapper structure could use it as well (while 
this looks like a classic SQL/MED issue, I am not sure that only foreign data 
wrappers would be interested in the API.

The new status of this patch is: Waiting on Author


Re: Proposal for Signal Detection Refactoring

2018-10-01 Thread Chris Travers
Moving this to documentation due to a general consensus that abstracting this 
is not necessarily worth it.

If we don't want to refactor and abstract this, it is worth documenting the 
design as to how things work now so that others who face bugs can consult docs 
instead of trying to determine acceptable or recommended practices by looking 
at the code alone.

Re: Proposal for Signal Detection Refactoring

2018-10-01 Thread Chris Travers
On Wed, Sep 26, 2018 at 9:54 AM Chris Travers 
wrote:

>
>
> On Tue, Sep 25, 2018 at 3:23 PM Tom Lane  wrote:
>
>> Chris Travers  writes:
>> > However,  what I think one could do is use a struct of volatile
>> > sig_atomic_t members and macros for checking/setting.  Simply writing a
>> > value is safe in C89 and higher.
>>
>> Yeah, we could group those flags in a struct, but what's the point?
>>
>
> This was one of two things I noticed in my previous patch on interrupts
> and loops where I wasn't sure what the best practice in our code is.
>
> If we don't want to make this change, then would there be any objection to
> me writing up a README describing the flags, and best practices in terms of
> checking them in our code based on the current places we use them?  If the
> current approach will be unlikely to change in the future, then at least we
> can document that the way I went about things is consistent with current
> best practices so next time someone doesn't really wonder.
>

Attaching a first draft of a readme.  Feedback welcome.  I noticed further
that we used to document signals and what they did with carious processes
but that this was removed after 7.0, probably due to complexity reasons.

>
>
>>
>> regards, tom lane
>>
>
>
> --
> Best Regards,
> Chris Travers
> Head of Database
>
> Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
> Saarbrücker Straße 37a, 10405 Berlin
>
>

-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


sig_doc_patch.patch
Description: Binary data


Re: Proposal for Signal Detection Refactoring

2018-09-26 Thread Chris Travers
On Tue, Sep 25, 2018 at 3:23 PM Tom Lane  wrote:

> Chris Travers  writes:
> > However,  what I think one could do is use a struct of volatile
> > sig_atomic_t members and macros for checking/setting.  Simply writing a
> > value is safe in C89 and higher.
>
> Yeah, we could group those flags in a struct, but what's the point?
>

This was one of two things I noticed in my previous patch on interrupts and
loops where I wasn't sure what the best practice in our code is.

If we don't want to make this change, then would there be any objection to
me writing up a README describing the flags, and best practices in terms of
checking them in our code based on the current places we use them?  If the
current approach will be unlikely to change in the future, then at least we
can document that the way I went about things is consistent with current
best practices so next time someone doesn't really wonder.


>
> regards, tom lane
>


-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: Proposal for Signal Detection Refactoring

2018-09-26 Thread Chris Travers
On Wed, Sep 26, 2018 at 12:41 AM Michael Paquier 
wrote:

> On Mon, Sep 24, 2018 at 09:27:35PM -0700, Andres Freund wrote:
> > On 2018-09-25 11:50:47 +0900, Michael Paquier wrote:
> >> PGDLLIMPORT changes don't get back-patched as well...
> >
> > We've been more aggressive with that lately, and I think that's good. It
> > just is a unnecessarily portability barrier for extension to be
> > judicious in applying that.
>
> Agreed.  Are there any objections with the proposal of changing the
> interruption pending flags in miscadmin.h to use sig_atomic_t?
> ClientConnectionLost would gain PGDLLIMPORT at the same time.
>

I am strongly in favor of doing this.


> --
> Michael
>


-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: Proposal for Signal Detection Refactoring

2018-09-25 Thread Chris Travers
On Mon, Sep 24, 2018 at 7:06 PM Andres Freund  wrote:

> On 2018-09-24 11:45:10 +0200, Chris Travers wrote:
> > I did some more reading.
> >
> > On Mon, Sep 24, 2018 at 10:15 AM Chris Travers  >
> > wrote:
> >
> > > First, thanks for taking the time to write this.  Its very helpful.
> > > Additional thoughts inline.
> > >
> > > On Mon, Sep 24, 2018 at 2:12 AM Michael Paquier 
> > > wrote:
> > >
> > >>
> > >> There could be value in refactoring things so as all the *Pending
> flags
> > >> of miscadmin.h get stored into one single volatile sig_atomic_t which
> > >> uses bit-wise markers, as that's at least 4 bytes because that's
> stored
> > >> as an int for most platforms and can be performed as an atomic
> operation
> > >> safely across signals (If my memory is right;) ).  And this leaves a
> lot
> > >> of room for future flags.
> > >>
> > >
> > > Yeah I will look into this.
> > >
> >
> >
> > Ok so having looked into this a bit more
> >
> > It looks like to be strictly conforming, you can't just use a series of
> > flags because neither C 89 or 99 guarantee that sig_atomic_t is
> read/write
> > round-trip safe in signal handlers.  In other words, if you read, are
> > pre-empted by another signal, and then write, you may clobber what the
> > other signal handler did to the variable.  So you need atomics, which are
> > C11
> >
> > What I would suggest instead at least for an initial approach is:
> >
> > 1.  A struct of volatile bools statically stored
> > 2.  macros for accessing/setting/clearing flags
> > 3.  Consistent use of these macros throughout the codebase.
> >
> > To make your solution work it looks like we'd need C11 atomics which
> would
> > be nice and maybe at some point we decide to allow newer feature, or we
> > could wrap this itself in checks for C11 features and provide atomic
> flags
> > in the future.  It seems that the above solution would strictly comply to
> > C89 and pose no concurrency issues.
>
> It's certainly *NOT* ok to use atomics in signal handlers
> indiscriminately, on some hardware / configure option combinations
> they're backed by spinlocks (or even semaphores) and thus *NOT*
> interrupt save.
>
> This doesn't seem to solve an actual problem, why are we discussing
> changing this? What'd be measurably improved, worth the cost of making
> backpatching more painful?


I am not sure if this is measurable as a problem but I am fairly sure this
is tangible and is likely to become more so over time.

Right now, the current approach is to use a series of globally defined
single variables for handling interrupt conditions.  Because this has grown
organically over time, the sort naming of these variables doesn't have a
hard degree of consistency.  Consequently, if one has code which needs to
check why it was interrupted, this is both a fragile process, and one which
imposes a dependency directly on internals.  We now have two different
areas in the code which do handling of interrupt conditions, and two more
places that non-intrusively check for whether certain kinds of interrupt
conditions are pending.

My sense is that the number of places which have to be aware of these sort
of things is likely to grow in the future.  Therefore I think it would be a
good idea sooner rather than later to ensure that the both the structures
and the mechanisms and interfaces are forward-looking, and make stability
guarantees we can hold well into the future without burdening code
maintenance much.  So maybe there is some short-term pain in back porting
but the goal is to make sure that long-term backporting is easier, and that
checking pending interrupt status is safer.

I am not sold on using bit flags here.  I don't think they are C89 or 99
safe (maybe in C11 with caveats).

My suspicion is that when I get to a second attempt at this problem, it
will look much closer to what we have now, just better encapsulated.
-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: Proposal for Signal Detection Refactoring

2018-09-25 Thread Chris Travers
On Tue, Sep 25, 2018 at 3:03 AM Tom Lane  wrote:

> Michael Paquier  writes:
> > And then within separate signal handlers things like:
> > void
> > StatementCancelHandler(SIGNAL_ARGS)
> > {
> > [...]
> > signalPendingFlags |= PENDING_INTERRUPT | PENDING_CANCEL_QUERY;
> > [...]
> > }
>
> AFAICS this still wouldn't work.  The machine code is still going to
> look (on many machines) like "load from signalPendingFlags,
> OR in some bits, store to signalPendingFlags".  So there's still a
> window for another signal handler to interrupt that and store some
> bits that will get lost.
>
> You could only fix that by blocking all signal handling during the
> handler, which would be expensive and rather pointless.
>
> I do not think that it's readily possible to improve on the current
> situation with one sig_atomic_t per flag.
>


After a fair bit of reading I think there are ways of doing this in C11 but
I don't think those are portable to C99.

In C99 (and, in practice C89, as the C99 committee noted there were no
known C89 implementations where reading was unsafe), reading or writing a
static sig_atomic_t inside a signal handler is safe, but a round-trip is
*not* guaranteed not to clobber.  While I could be wrong, I think it is
only in C11 that you have any round-trip operations which are guaranteed
not to clobber in the language itself.

Basically we are a long way out to be able to consider these a single value
as flags.

However,  what I think one could do is use a struct of volatile
sig_atomic_t members and macros for checking/setting.  Simply writing a
value is safe in C89 and higher.



> regards, tom lane
>


-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: Proposal for Signal Detection Refactoring

2018-09-24 Thread Chris Travers
Very minor revision
On Mon, Sep 24, 2018 at 11:45 AM Chris Travers 
wrote:

>
> Ok so having looked into this a bit more
>
> It looks like to be strictly conforming, you can't just use a series of
> flags because neither C 89 or 99 guarantee that sig_atomic_t is read/write
> round-trip safe in signal handlers.  In other words, if you read, are
> pre-empted by another signal, and then write, you may clobber what the
> other signal handler did to the variable.  So you need atomics, which are
> C11
>
> What I would suggest instead at least for an initial approach is:
>
> 1.  A struct of volatile bools statically stored
>

These would be implemented as sig_atomic_t which is defined in C89 but has
no atomic operators other than writing the full value.


> 2.  macros for accessing/setting/clearing flags
> 3.  Consistent use of these macros throughout the codebase.
>
> To make your solution work it looks like we'd need C11 atomics which would
> be nice and maybe at some point we decide to allow newer feature, or we
> could wrap this itself in checks for C11 features and provide atomic flags
> in the future.  It seems that the above solution would strictly comply to
> C89 and pose no concurrency issues.
>
> --
>> Best Regards,
>> Chris Travers
>> Head of Database
>>
>> Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
>> Saarbrücker Straße 37a, 10405 Berlin
>>
>>
>
> --
> Best Regards,
> Chris Travers
> Head of Database
>
> Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
> Saarbrücker Straße 37a, 10405 Berlin
>
>

-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: Proposal for Signal Detection Refactoring

2018-09-24 Thread Chris Travers
I did some more reading.

On Mon, Sep 24, 2018 at 10:15 AM Chris Travers 
wrote:

> First, thanks for taking the time to write this.  Its very helpful.
> Additional thoughts inline.
>
> On Mon, Sep 24, 2018 at 2:12 AM Michael Paquier 
> wrote:
>
>>
>> There could be value in refactoring things so as all the *Pending flags
>> of miscadmin.h get stored into one single volatile sig_atomic_t which
>> uses bit-wise markers, as that's at least 4 bytes because that's stored
>> as an int for most platforms and can be performed as an atomic operation
>> safely across signals (If my memory is right;) ).  And this leaves a lot
>> of room for future flags.
>>
>
> Yeah I will look into this.
>


Ok so having looked into this a bit more

It looks like to be strictly conforming, you can't just use a series of
flags because neither C 89 or 99 guarantee that sig_atomic_t is read/write
round-trip safe in signal handlers.  In other words, if you read, are
pre-empted by another signal, and then write, you may clobber what the
other signal handler did to the variable.  So you need atomics, which are
C11

What I would suggest instead at least for an initial approach is:

1.  A struct of volatile bools statically stored
2.  macros for accessing/setting/clearing flags
3.  Consistent use of these macros throughout the codebase.

To make your solution work it looks like we'd need C11 atomics which would
be nice and maybe at some point we decide to allow newer feature, or we
could wrap this itself in checks for C11 features and provide atomic flags
in the future.  It seems that the above solution would strictly comply to
C89 and pose no concurrency issues.

-- 
> Best Regards,
> Chris Travers
> Head of Database
>
> Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
> Saarbrücker Straße 37a, 10405 Berlin
>
>

-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: Proposal for Signal Detection Refactoring

2018-09-24 Thread Chris Travers
First, thanks for taking the time to write this.  Its very helpful.
Additional thoughts inline.

On Mon, Sep 24, 2018 at 2:12 AM Michael Paquier  wrote:

> On Fri, Sep 21, 2018 at 12:35:46PM +0200, Chris Travers wrote:
> > I understand how lock levels don't fit a simple hierarchy but at least
> > when it comes to what is going to be aborted on a signal, I am having
> > trouble understanding the problem here.
>
> It may be possible to come with a clear hierarchy with the current
> interruption types in place.  Still I am not sure that the definition
> you put behind is completely correct, and I think that we need to
> question as well the value of putting such restrictions for future
> interruption types because they would need to fit into it.


The future-safety issue is a really good one and it's one reason I kept the
infinite loop patch as semantically consistent with the API as I could at
the cost of some complexity.

I have another area where I think a patch would be more valuable anyway in
terms of refactoring.


>   That's quite
> a heavy constraint to live with.  There is such logic with wal_level for
> example, which is something I am not completely happy with either...
> But this one is a story for another time, and another thread.
>

 From a cleanup perspective a concentric circles approach seems like it is
correct to me (which would correspond to a hierarchy of interrupts) but I
can see that assuming that all pending interrupts would be checked solely
for cleanup reasons might be a bad assumption on my part.

>
> Regarding your patch, it seems to me that it does not improve
> readability as I mentioned up-thread because you lose sight of what can
> be interrupted in a given code path, which is what the current code
> shows actually nicely.
>

So I guess there are two fundamental questions here.

1.  Do we want to move away from checking global flags like this directly?
I think we do because it makes future changes possibly harder and more
complex since there is no encapsulation of logic.  But I don't see a point
in putting effort into that without consensus.

>
> There could be value in refactoring things so as all the *Pending flags
> of miscadmin.h get stored into one single volatile sig_atomic_t which
> uses bit-wise markers, as that's at least 4 bytes because that's stored
> as an int for most platforms and can be performed as an atomic operation
> safely across signals (If my memory is right;) ).  And this leaves a lot
> of room for future flags.
>

Yeah I will look into this.

Thanks again for taking the time to go over the concerns in detail.  It
really helps.

Best Wishes,
Chris Travers

> --
> Michael
>


-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: proposal: prefix function

2018-09-21 Thread Chris Travers
On Fri, Sep 21, 2018 at 10:09 AM Pavel Stehule 
wrote:

> Hi
>
> can we implement prefix function for fast test if substr is prefix of some
> string?
>
> create or replace function prefix(str text, substr text)
> returns boolean as $$
>   select substr(str, 1, length(substr)) = substr
> $$ language sql;
>
> This function can be very effective in C language. Now it should be
> implemented with like or regexp, what is significantly more expensive.
>
>
These would just be wrappers around already existing internal functions,
right?


> Regards
>
> Pavel
>


-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: Proposal for Signal Detection Refactoring

2018-09-21 Thread Chris Travers
On Fri, Sep 21, 2018 at 6:46 AM Michael Paquier  wrote:

> On Thu, Sep 20, 2018 at 03:08:34PM +0200, Chris Travers wrote:
> > So here's a small patch.  I will add it for the next commit fest unless
> > anyone has any reason I shouldn't.
>
> -   return InterruptPending && (QueryCancelPending || ProcDiePending);
> +   return PENDING_INTERRUPT_LEVEL() >= QUERY_CANCEL;
>
> This is pretty similar to lock levels, where it is pretty hard to put a
> strict monotone hierarchy when it comes to such interruptions.


I understand how lock levels don't fit a simple hierarchy but at least when
it comes
to what is going to be aborted on a signal, I am having trouble
understanding the problem here.

Does anyone have any search terms I should look into the archives regarding
this issue?

I will assume this patch will be rejected then.



> The new
> code does not seem like an improvment either, as for example in the code
> mentioned above, you know directly what are the actions involved, which
> is not the case with the new code style.
>

The initial motivation for this patch was that it felt a little bit odd to
be using global
boolean flags for this sort of approach and I was concerned that if this
approach
proliferated it might cause problems later.  As far as I know my previous
patch was
the second use of the current pattern.

So one thing I wonder is if there are ways where abstracting this would
make more sense.


> --
> Michael
>


-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Proposal for Signal Detection Refactoring

2018-09-20 Thread Chris Travers
After the patch we did for the race condition here, one point which was
brought up which I thought was quite relevant was the fact that checking
global flags feels a little ugly.

I would like to set up a patch which refactors this as follows:

1.  We create an enum of cancellation levels:
NO_INTERRUPT
INTERRUPT
QUERY_CANCEL
PROCESS_EXIT

2.  We create a macro called PENDING_INTERRUPT_LEVEL() which returns the
highest pending interrupt level.

3.  We check on whether the output of this is greater or equal to the level
we need to handle and work on that basis.

This allows us to handle cases where we just want to check the interrupt
level for return or cleanup purposes.  It does not apply to cases where we
need to change interrupt handling temporarily as we do in a couple of
cases.  I think it is cleaner than checking the QueryCancelPending and
ProcDiePending global flags directly.

So here's a small patch.  I will add it for the next commit fest unless
anyone has any reason I shouldn't.

-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


sig_refactor.patch
Description: Binary data


Re: Code of Conduct plan

2018-09-19 Thread Chris Travers
;
> There are intangible but very real (IMO) costs to being a community that
> welcomes an unhealthy and hostile communication style, harassment and
> personal attacks in the guise of technical argument, bullying defended as
> making sure you have the right stuff to survive in a "meritocracy", etc.
> Thankfully we are generally not such a community. But try asking a few
> women you know in the Postgres community - if you can find any! - how their
> experience at conferences has been. Then ask if maybe there are still a few
> things we could work on changing.
>
> I've found it quite confronting dealing with some of the more heated
> exchanges on hackers from some of our most prominent team members. I've
> sent the occasional gentle note to ask someone to chill and pause before
> replying, too. And I've deserved to receive one a couple of times, though I
> never have, as I'm far from free from blame here.
>

But that happens to everyone.  Male, female, etc.  And yes, such notes are
good.

I think you are right to point to harassment though.  I have seen people in
this community resort to some really aggressive tactics with other members,
particularly off-list (and sometimes in person).  The interactions on the
postgresql.org infrastructure have always been good except in a few cases.
That is the one really important reason for enforcement against off-list
actions.  It is not (and can't be) about politics.  It has to be about
personally directed campaigns of harassment.

>
> People love to point to LKML as the way it "must" be done to succeed in
> software. Yet slowly that community has also come to recognise that verbal
> abuse under the cloak of technical discussion is harmful to quality
> discussion and drives out good people, harming the community long term.
> Sure, not everything has to be super-diplomatic, but there's no excuse for
> verbal bullying and wilful use of verbal aggression either. As widely
> publicised, even Linus has recently recognised aspects of this, despite
> being the poster child of proponents of abusive leadership for decades.
>
> We don't have a culture like that. So in practice, I don't imagine the CoC
> will see much use. The real problematic stuff that happens in this
> community happens in conference halls and occasionally by private mail,
> usually in the face of a power imbalance that makes the recipient/victim
> reluctant to speak out. I hope a formal CoC will give them some hope
> they'll be heard if they do take the personal risk to speak up. I've seen
> so much victim blaming in tech that I'm not convinced most people
> experiencing problems will be willing to speak out anyway, but hopefully
> they'll be more so with a private and receptive group to talk to.
>

I will say also that where I have seen the most problems I would not speak
out in detail because I don't feel like they rise to a level where the CoC
should be involved.


>
> Let me be clear here, I'm no fan of trial by rabid mob. That's part of why
> something like the CoC and a backing body is important. Otherwise people
> are often forced to silently endure, or go loudly public. The latter tends
> to result in a big messy explosion that hurts the community, those saying
> they're victim(s) and the alleged perpetrator(s), no matter what the facts
> and outcomes. It also encourages people to jump on one comment and run way
> too far with it, instead of looking at patterns and giving people chances
> to fix their behaviour.
>
> I don't want us to have this:
> https://techcrunch.com/2013/03/21/a-dongle-joke-that-spiraled-way-out-of-control/
> . Which is actually why I favour a CoC, one with a resolution process and
> encouragement toward some common sense. Every player in that story was an
> idiot, and while none deserved the abuse and harrassment that came their
> way, it's a shame it wan't handled by a complaint to a conference CoC group
> instead.
>
> I'd like the CoC to emphasise that while we don't want to restrain people
> from "calling out" egregious behaviour, going via the CoC team is often
> more likely to lead to constructive communication and positive change.
>

Agreed on this.

My objection to the additional wording is simply that a) I think it does
not tackle the problem it needs to tackle, and b) creates a claim which
covers a bunch of things that it really shouldn't.  It's a serious bug and
I still hope it gets fixed before it causes problems.

>
> --
>  Craig Ringer   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>


-- 
Best Wishes,
Chris Travers

Efficito:  Hosted Accounting and ERP.  Robust and Flexible.  No vendor
lock-in.
http://www.efficito.com/learn_more


Re: Code of Conduct

2018-09-18 Thread Chris Travers
On Tue, Sep 18, 2018 at 4:35 PM Tomas Vondra 
wrote:

> On 09/18/2018 01:47 PM, James Keener wrote:
> >  > following a long consultation process
> >
> > It's not a consultation if any dissenting voice is simply ignored.
> > Don't sugar-coat or politicize it like this -- it was rammed down
> > everyone's throats. That is core's right, but don't act as everyone's
> > opinions and concerns were taken into consideration.
>
> I respectfully disagree.
>
> I'm not sure which dissenting voices you think were ignored, but from
> what I've observed in the various CoC threads the core team took the
> time to respond to all comments. That does not necessarily mean the
> resulting CoC makes everyone happy, but unfortunately that's not quite
> possible. And it does not mean it was not an honest consultation.
>
> IMO the core team did a good job in listening to comments, tweaking the
> wording and/or explaining the reasoning. Kudos to them.
>

I said I would stand aside my objections after the last point I mentioned
them but I did not feel that my particular objection and concern with
regard to one specific sentence added got much of a hearing.  This being
said, it is genuinely hard to sort through the noise and try to reach the
signal.  I think the resurgence of the debate about whether we need a code
of conduct made it very difficult to discuss specific objections to
specific wording.  So to be honest the breakdown was mutual.

>
> > There are a good number of folks who are concerned that this CoC is
> > overreaching and is ripe for abuse. Those concerns were always
> > simply, plainly, and purposely ignored.
> No, they were not. There were multiple long discussions about exactly
> these dangers, You may dislike the outcome, but it was not ignored.
>

Also those of us who had specific, actionable concerns were often drowned
out by the noise.  That's deeply unfortunate.

I think those of us who had specific concerns about one specific sentence
that was added were drowned out by those who seemed to be opposed to the
idea of a code of conduct generally.

I would have appreciated at least a reason why the concerns I had about the
fact that the addition a) doesn't cover what it is needs to cover, and b)
will attract complaints that it shouldn't cover was not considered valid.
But I can understand that given the noise-to-signal ratio of the discussion
made such discussion next to impossible.

Again I find that regrettable.

>
> >  > Please take time to read and understand the CoC, which is intended to
> > ensure that PostgreSQL remains an open and enjoyable project for anyone
> > to join and participate in.
> >
> > I sincerely hope so, and that it doesn't become a tool to enforce social
> > ideology like in other groups I've been part of. Especially since this
> > is the main place to come to get help for PostgreSQL and not a social
> club.
> >
>
> Ultimately, it's a matter of trust that the CoC committee and core team
> apply the CoC in a careful and cautious way. Based on my personal
> experience with most of the people involved in both groups I'm not
> worried about this part.
>

I would actually go further than you here.  The CoC committee *cannot*
apply the CoC in the way that the opponents fear.  The fact is, Europe has
anti-discrimination laws regarding social and political ideology (something
the US might want to consider as it would help avoid problems on this list
;-) ).  And different continents have different norms on these sorts of
things.  Pushing a social ideology via the code of conduct would, I
suspect, result in everything from legal action to large emerging markets
going elsewhere.  So I don't think ti is a question of "trust us" but
rather that the community won't let that sort of abuse happen no matter who
is on the CoC committee.

>
>
> regards
>
> --
> Tomas Vondra  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
>

-- 
Best Wishes,
Chris Travers

Efficito:  Hosted Accounting and ERP.  Robust and Flexible.  No vendor
lock-in.
http://www.efficito.com/learn_more


Re: Code of Conduct plan

2018-09-17 Thread Chris Travers
On Mon, Sep 17, 2018 at 5:28 PM Joshua D. Drake 
wrote:

> On 09/17/2018 08:11 AM, Dmitri Maziuk wrote:
>
> On Sun, 16 Sep 2018 12:52:34 +
> Martin Mueller  
>  wrote:
>
>
> ... The overreach is dubious on both practical and theoretical grounds. 
> "Stick to your knitting " or the KISS principle seem good advice in this 
> context.
>
> Moderated mailing lists ain't been broken all these years, therefore they 
> need fixing. Obviously.
>
>
> Folks,
>
> At this point it is important to accept that the CoC is happening. We
> aren't going to stop that. The goal now is to insure a CoC that is
> equitable for all community members and that has appropriate
> accountability. At hand it appears that major concern is the CoC trying to
> be authoritative outside of community channels. As well as wording that is
> a bit far reaching. Specifically I think people's main concern is these two
> sentences:
>
> "To that end, we have established this Code of Conduct for community
> interaction and participation in the project’s work and the community at
> large. This Code is meant to cover all interaction between community
> members, whether or not it takes place within postgresql.org
> infrastructure, so long as there is not another Code of Conduct that takes
> precedence (such as a conference's Code of Conduct)."
>

Exactly.  And actually the first sentence is not new.  The second one is a
real problem though.  I am going to try one last time at an additional
alternative.

" To that end, we have established this Code of Conduct for community
interaction and participation in the project’s work and the community at
large.   This code of conduct covers all interaction between community
members on the postgresql.org infrastructure.  Conduct outside the
postgresql.org infrastructure may call the Code of Conduct committee to act
as long as the interaction (or interaction pattern) is community-related,
other parties are unable to act, and the Code of Conduct committee
determines that it is in the best interest of the community to apply this
Code of Conduct."

This solves a number of important problems.

1.  It provides a backstop (as Tom Lane suggested was needed) against a
conference refusing to enforce their own code of conduct in a way the
community finds acceptable while the current wording does not provide any
backstop as long as there is a code of conduct for a conference.
2.  It provides a significant barrier to applying the code of conduct to,
say, political posts on, say, Twitter.
3.  It preserves the ability of the Code of Conduct Committee to act in the
case where someone takes a pattern of harassment off-list and
off-infrastructure.  And it avoids arguing whether Facebook's Community
Standards constitute "another Code of Conduct that takes precedence."

>
> If we can constructively provide feedback about those two sentences, great
> (or constructive feedback on other areas of the CoC). If we can't then this
> thread needs to stop. It has become unproductive.
>
> My feedback is that those two sentences provide an overarching authority
> that .Org does not have the right to enforce and that it is also largely
> redundant because we allow that the idea that if another CoC exists, then
> ours doesn't apply. Well every single major collaboration channel we would
> be concerned with (including something like Blogger) has its own CoC within
> its Terms of use. That effectively neuters the PostgreSQL CoC within places
> like Slack, Facebook, Twitter etc...
>

Fascinating that this would, on its face, not apply to a harassment
campaign carried out over twitter, but it would apply to a few comments
made over drinks at a bar.

>
> JD
>
> --
> Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc
> ***  A fault and talent of mine is to tell it exactly how it is.  ***
> PostgreSQL centered full stack support, consulting and development.
> Advocate: @amplifypostgres || Learn: https://postgresconf.org
> * Unless otherwise stated, opinions are my own.   *
>
>

-- 
Best Wishes,
Chris Travers

Efficito:  Hosted Accounting and ERP.  Robust and Flexible.  No vendor
lock-in.
http://www.efficito.com/learn_more


Strange OSX make check-world failure

2018-09-17 Thread Chris Travers
Logs are below.  This happens on master, and on 10.  I suspect it is an
issue with something regarding ecpg.  Wondering what I am doing wrong.

== creating temporary instance==

== initializing database system   ==

== starting postmaster==

running on port 50853 with PID 20314

== creating database "ecpg1_regression"   ==

CREATE DATABASE

ALTER DATABASE

== creating database "ecpg2_regression"   ==

CREATE DATABASE

ALTER DATABASE

== creating role "regress_ecpg_user1" ==

CREATE ROLE

GRANT

GRANT

== creating role "regress_ecpg_user2" ==

CREATE ROLE

GRANT

GRANT

== running regression test queries==

test compat_informix/dec_test ... ok

test compat_informix/charfuncs ... ok

test compat_informix/rfmtdate ... ok

test compat_informix/rfmtlong ... ok

test compat_informix/rnull... ok

test compat_informix/sqlda... ok

test compat_informix/describe ... ok

test compat_informix/test_informix ... ok

test compat_informix/test_informix2 ... ok

test connect/test2... ok

test connect/test3... ok

test connect/test4... ok

test connect/test5... ok

test pgtypeslib/dt_test   ... stdout stderr FAILED (test process was
terminated by signal 6: Abort trap)

test pgtypeslib/dt_test2  ... stdout stderr FAILED (test process was
terminated by signal 6: Abort trap)

test pgtypeslib/num_test  ... stdout stderr FAILED (test process was
terminated by signal 6: Abort trap)

test pgtypeslib/num_test2 ... stdout stderr FAILED (test process was
terminated by signal 6: Abort trap)

test pgtypeslib/nan_test  ... ok

test preproc/array_of_struct  ... ok

test preproc/pointer_to_struct ... ok

test preproc/autoprep ... ok

test preproc/comment  ... ok

test preproc/cursor   ... ok

test preproc/define   ... ok

test preproc/init ... ok

test preproc/strings  ... ok

test preproc/type ... ok

test preproc/variable ... ok

test preproc/outofscope   ... ok

test preproc/whenever ... ok

test sql/array... ok

test sql/binary   ... ok

test sql/code100  ... ok

test sql/copystdout   ... ok

test sql/define   ... ok

test sql/desc ... ok

test sql/sqlda... stdout stderr FAILED (test process was
terminated by signal 6: Abort trap)

test sql/describe ... ok

test sql/dynalloc ... ok

test sql/dynalloc2... ok

test sql/dyntest  ... ok

test sql/execute  ... ok

test sql/fetch... ok

test sql/func ... ok

test sql/indicators   ... ok

test sql/oldexec  ... ok

test sql/quote... ok

test sql/show ... ok

test sql/insupd   ... ok

test sql/parser   ... ok

test thread/thread... ok

test thread/thread_implicit   ... ok

test thread/prep  ... ok

test thread/alloc ... ok

test thread/descriptor... ok

test sql/twophase ... stderr FAILED

== shutting down postmaster   ==


===

 6 of 56 tests failed.

===========

-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: [PATCH] Fix for infinite signal loop in parallel scan

2018-09-17 Thread Chris Travers
On Mon, Sep 17, 2018 at 2:59 PM Oleksii Kliukin  wrote:

>
>
> > On 7. Sep 2018, at 17:57, Chris Travers 
> wrote:
> >
> > Hi;
> >
> > Attached is the patch we are fully testing at Adjust.  There are a few
> non-obvious aspects of the code around where the patch hits.I have run
> make check on Linux and MacOS, and make check-world on Linux (check-world
> fails on MacOS on all versions and all branches due to ecpg failures).
> Automatic tests are difficult because it is to a rare race condition which
> is difficult (or possibly impossible) to automatically create.  Our current
> approach testing is to force the issue using a debugger.  Any ideas on how
> to reproduce automatically are appreciated but even on our heavily loaded
> systems this seems to be a very small portion of queries that hit this case
> (meaning the issue happens about once a week for us).
>
>
> I did some manual testing on it, putting breakpoints in the
> ResolveRecoveryConflictWithLock in the startup process on a streaming
> replica
> (configured with a very low max_standby_streaming_delay, i.e. 100ms) and
> to the
> posix_fallocate call on the normal backend on the same replica. At this
> point I
> also instructed gdb not to stop upon receiving SIGUSR1 (handle SIGUSR1
> nonstop)
> and resumed execution on both the backend and the startup process.
>
> Then I simulated a conflict by creating a rather big (3GB) table on the
> master,
> doing some updates on it and then running an aggregate on the replica
> backend
> (i.e. 'select count(1) from test' with 'force_parallel_mode = true')
> where I
> set the breakpoint. The aggregate and force_parallel_mode ensured that
> the query was executed as a parallel one, leading to allocation of the DSM
>
> Once the backend reached the posix_fallocate call and was waiting on a
> breakpoint, I called 'vacuum full test' on the master that lead to a
> conflict
> on the replica running 'select from test' (in a vast majority of cases),
> triggering the breakpoint in ResolveRecoveryConflictWithLock in the startup
> process, since the startup process tried to cancel the conflicting
> backend. At
> that point I continued execution of the startup process (which would loop
> in
> CancelVirtualTransaction sending SIGUSR1 to the backend while the backend
> waited to be resumed from the breakpoint). Right after that, I changed the
> size
> parameter on the backend to something that would make posix_fallocate run
> for a
> bit longer, typically ten to hundred MB. Once the backend process was
> resumed,
> it started receiving SIGUSR1 from the startup process, resulting in
> posix_fallocate existing with EINTR.
>
> With the patch applied, the posix_fallocate loop terminated right away
> (because
> of QueryCancelPending flag set to true) and the backend went through the
> cleanup, showing an ERROR of cancelling due to the conflict with recovery.
> Without the patch, it looped indefinitely in the dsm_impl_posix_resize,
> while
> the startup process were looping forever, trying to send SIGUSR1.
>
> One thing I’m wondering is whether we could do the same by just blocking
> SIGUSR1
> for the duration of posix_fallocate?
>

Many thanks!

If we were to do that, I would say we should mask all signals we can mask
during the call.

I don't have a problem going down that road instead if people think it is
better.

As a note, we have a patched version of PostgreSQL 10.5 ready to deploy to
a system affected by this and expect that to be done this week.


>
> Cheers,
> Oleksii Kliukin



-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: [PATCH] Fix for infinite signal loop in parallel scan

2018-09-17 Thread Chris Travers
First, I have attached a cleaned-up revision (pg_indent, removing a
dangling comment etc)

On Fri, Sep 14, 2018 at 1:16 PM Thomas Munro 
wrote:

> On Sat, Sep 8, 2018 at 3:57 AM Chris Travers 
> wrote:
> > Attached is the patch we are fully testing at Adjust.
>
> Thanks!
>
> > I have run make check on Linux and MacOS, and make check-world on Linux
> (check-world fails on MacOS on all versions and all branches due to ecpg
> failures).
>
> FWIW it's entirely possible to get make check-world passing on a Mac.
> Maybe post the problem you're seeing to a new thread?
>

Will do.

>
> > ...
>
> > In the past it had been suggested we do PG_TRY(); and PG_CATCH(), but
> given that it is not consistent whether we can raise an error or whether we
> MUST raise an error, I don't see how this approach can work.  As far as I
> can see, we MUST raise an error in the appropriate spot if and only if
> elevel is set to a sufficient level.
>
> Yeah, your way looks a bit nicer than something involving PG_TRY().
>
> > Is there any feedback on this approach before I add it to the next
> commitfest?
>
> Please go ahead and add it.  Being a bug fix, we'll commit it sooner
> than the open commitfest anyway, but it's useful to have it in there.
>
> + if (errno == EINTR && elevel >= ERROR)
> + CHECK_FOR_INTERRUPTS();
>
> I think we might as well just CHECK_FOR_INTERRUPTS() unconditionally.
> In this branch elevel is always ERROR as you noted, and the code
> around there is confusing enough already.
>

The reason I didn't do that is future safety and for the off chance that
someone could do something strange with extensions today that might use
this in an unsafe way.While I cannot think of any use case for calling
this in an unsafe way, I know for a fact that one might write extensions,
background workers, etc. to do things with this API that are not in our
current code right now.  For something that could be back ported I really
want to work as much as possible in a way that does not possibly brake even
exotic extensions.

Longer-run I would like to see if I can help refactor this code so that
responsibility for error handling is clearer and such problems cannot
exist.  But that's not something to back port.

>
> + } while (rc == EINTR && !(ProcDiePending || QueryCancelPending));
>
> There seems to be a precedent for checking QueryCancelPending directly
> to break out of a loop in regcomp.c and syncrep.c.  So this seems OK.
>

Yeah, I checked.


> Hmm, I wonder if there should be an INTERRUPT_PENDING() macro that
> hides those details, but allows you to break out of a loop and then do
> some cleanup before CHECK_FOR_INTERRUPT().


That is a good idea.

>
> --
> Thomas Munro
> http://www.enterprisedb.com
>


-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


racecond.patch
Description: Binary data


Re: Code of Conduct plan

2018-09-14 Thread Chris Travers
On Fri, Sep 14, 2018 at 4:14 PM Dave Page  wrote:

>
>
> On Fri, Sep 14, 2018 at 3:08 PM, Joshua D. Drake 
> wrote:
>
>> On 09/14/2018 01:31 AM, Chris Travers wrote:
>>
>>
>> I apologize for the glacial slowness with which this has all been moving.
>>> The core team has now agreed to some revisions to the draft CoC based on
>>> the comments in this thread; see
>>>
>>> https://wiki.postgresql.org/wiki/Code_of_Conduct
>>>
>>> (That's the updated text, but you can use the diff tool on the page
>>> history tab to see the changes from the previous draft.)
>>>
>>
>> I really have to object to this addition:
>> "This Code is meant to cover all interaction between community members,
>> whether or not it takes place within postgresql.org infrastructure, so
>> long as there is not another Code of Conduct that takes precedence (such as
>> a conference's Code of Conduct)."
>>
>> That covers things like public twitter messages over live political
>> controversies which might not be personally directed.   At least if one is
>> going to go that route, one ought to *also* include a safe harbor for
>> non-personally-directed discussions of philosophy, social issues, and
>> politics.  Otherwise, I think this is asking for trouble.  See, for
>> example, what happened with Opalgate and how this could be seen to
>> encourage use of this to silence political controversies unrelated to
>> PostgreSQL.
>>
>>
>> I think this is a complicated issue. On the one hand, postgresql.org has
>> no business telling people how to act outside of postgresql.org. Full
>> stop.
>>
>
> I'm going to regret jumping in here, but...
>
> I disagree. If a community member decides to join forums for other
> software and then strongly promotes PostgreSQL to the point that they
> become abusive or offensive to people making other software choices, then
> they are clearly bringing the project into disrepute and we should have
> every right to sanction them by preventing them participating in our
> project in whatever ways are deemed appropriate.
>

 Actually, the easier case here is not being abusive to MySQL users, as the
code of conduct really doesn't clearly cover that anyway.  The easier case
is where two people have a feud and one person carries on a harassment
campaign over various forms of social media.  The current problem is:

1.  The current code of conduct is not clear as to whether terms of
service/community standards of, say, Reddit, supersede or not, and
2.  The community has to act (even if it is includes behavior at a
conference which has its own code of conduct)

So I think the addition is both over inclusive and under inclusive.   It is
over inclusive because it invites a certain group of (mostly American)
people to pick fights (not saying this is all Americans).  And it is under
inclusive because there are cases where the code of conduct *should* be
employed when behavior includes behavior at events which might have their
own codes of conduct.

On the other side, consider someone carrying on a low-grade harassment
campaign against another community member at a series of conferences where
each conference may not amount to a real actionable concern but where the
pattern as a whole might.  There's the under inclusive bit.

So I don't like this clause because I think it invites problems and doesn't
solve issues.
-- 
Best Wishes,
Chris Travers

Efficito:  Hosted Accounting and ERP.  Robust and Flexible.  No vendor
lock-in.
http://www.efficito.com/learn_more


Re: Code of Conduct plan

2018-09-14 Thread Chris Travers
On Fri, Sep 14, 2018 at 11:45 AM Ilya Kosmodemiansky 
wrote:

> On Fri, Sep 14, 2018 at 10:31 AM, Chris Travers 
> wrote:
> > I really have to object to this addition:
> > "This Code is meant to cover all interaction between community members,
> > whether or not it takes place within postgresql.org infrastructure, so
> long
> > as there is not another Code of Conduct that takes precedence (such as a
> > conference's Code of Conduct)."
> >
> > That covers things like public twitter messages over live political
> > controversies which might not be personally directed.   At least if one
> is
> > going to go that route, one ought to *also* include a safe harbor for
> > non-personally-directed discussions of philosophy, social issues, and
> > politics.  Otherwise, I think this is asking for trouble.  See, for
> example,
> > what happened with Opalgate and how this could be seen to encourage use
> of
> > this to silence political controversies unrelated to PostgreSQL.
>
> I think, this point has nothing to do with _correct_ discussions or
> public tweets.
>
> If one community member tweets publicly and in a way which abuses
> other community members, it is obvious CoC violation. It is hard to
> imagine healthy community if someone interacts with others  correctly
> on the list or at a conference because the CoC stops him doing things
> which he will do on private capacity to the same people when CoC
> doesnt apply.
>
> If someone reports CoC violation just because other community member's
> _correct_ public tweet or whatsoever  expressed different
> political/philosophical/religious views, this is a quite different
> story. I suppose CoC committee and/or Core team in this case should
> explain the reporter the purpose of CoC rather than automatically
> enforce it.
>

So first, I think what the clause is trying to do is address cases where
harassment targeting a particular community member takes place outside the
infrastructure and frankly ensuring that the code of conduct applies in
these cases is important and something I agree with.

However, let's look at problem cases:

"I am enough of a Marxist to see gender as a qualitative relationship to
biological reproduction and maybe economic production too."

I can totally imagine someone arguing that such a tweet might be abusive,
and certainly not "correct."

Or consider:

"The effort to push GLBT rights on family-business economies is nothing
more than an effort at corporate neocolonialism."

Which would make the problem more clear.  Whether or not a comment like
that occurring outside postgresql.org infrastructure would be considered
"correct" or "abusive" is ultimately a political decision and something
which, once that fight is picked, has no reasonable solution in an
international and cross-cultural product (where issues like sexuality,
economics, and how gender and individualism intersect will vary
dramatically across members around the world).  There are people who will
assume that both of the above statements are personally offensive and
attacks on the basis of gender identity even if they are critiques of
political agendas severable from that.  Worse, the sense of attack
themselves could be seen as attacks on culture or religions of other
participants.

Now neither of these comments would be tolerated as viewpoints expressed on
PostgreSQL.org email lists because they are off-topic, but once one expands
the code of conduct in this way they become fair game.  Given the way
culture war issues are shaping up particularly in the US, I think one has
to be very careful not to set an expectation that this applies to literally
everything that anyone does anywhere.

So maybe something more like:

"Conduct that occurs outside the postgresql.org infrastructure is not
automatically excluded from enforcement of this code of conduct.  In
particular if other parties are unable to act, and if it is, on balance, in
the interest of the global community to apply the code of conduct, then the
code of conduct shall apply."

>
> > --
> > Best Wishes,
> > Chris Travers
> >
> > Efficito:  Hosted Accounting and ERP.  Robust and Flexible.  No vendor
> > lock-in.
> > http://www.efficito.com/learn_more
>


-- 
Best Wishes,
Chris Travers

Efficito:  Hosted Accounting and ERP.  Robust and Flexible.  No vendor
lock-in.
http://www.efficito.com/learn_more


Re: Code of Conduct plan

2018-09-14 Thread Chris Travers
On Wed, Sep 12, 2018 at 10:53 PM Tom Lane  wrote:

> I wrote:
> > Stephen Frost  writes:
> >> We seem to be a bit past that timeline...  Do we have any update on when
> >> this will be moving forward?
> >> Or did I miss something?
>
> > Nope, you didn't.  Folks have been on holiday which made it hard to keep
> > forward progress going, particularly with respect to selecting the
> initial
> > committee members.  Now that Magnus is back on shore, I hope we can
> > wrap it up quickly --- say by the end of August.
>
> I apologize for the glacial slowness with which this has all been moving.
> The core team has now agreed to some revisions to the draft CoC based on
> the comments in this thread; see
>
> https://wiki.postgresql.org/wiki/Code_of_Conduct
>
> (That's the updated text, but you can use the diff tool on the page
> history tab to see the changes from the previous draft.)
>

I really have to object to this addition:
"This Code is meant to cover all interaction between community members,
whether or not it takes place within postgresql.org infrastructure, so long
as there is not another Code of Conduct that takes precedence (such as a
conference's Code of Conduct)."

That covers things like public twitter messages over live political
controversies which might not be personally directed.   At least if one is
going to go that route, one ought to *also* include a safe harbor for
non-personally-directed discussions of philosophy, social issues, and
politics.  Otherwise, I think this is asking for trouble.  See, for
example, what happened with Opalgate and how this could be seen to
encourage use of this to silence political controversies unrelated to
PostgreSQL.

>
> I think we are about ready to announce the initial membership of the
> CoC committee, as well, but that should be a separate post.
>
> regards, tom lane
>
>

-- 
Best Wishes,
Chris Travers

Efficito:  Hosted Accounting and ERP.  Robust and Flexible.  No vendor
lock-in.
http://www.efficito.com/learn_more


Re: Can I just reload the slave to change primary_conninfo?

2018-09-10 Thread Chris Travers
On Mon, Sep 10, 2018 at 1:36 PM Michael Paquier  wrote:

> On Mon, Sep 10, 2018 at 12:08:07PM +0300, Sergei Kornilov wrote:
> >> This parameter is defined in postgresql.conf
> > Huh, i believe it be in future.
> > Currently it is recovery.conf parameter, and yes - it can be set (or
> > changed) only at database start.
>
> Thanks Sergei for the correction.  Indeed you need to read that as
> recovery.conf, not postgresql.conf.
>

Is there any reason this cannot be changed via a signal?  Is it a general
lack of infrastructure or is it there significant problem we want to ensure
never happens?

> --
> Michael
>


-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Bug report: Dramatic increase in conflict with recovery after upgrading 10.2->10.5

2018-09-10 Thread Chris Travers
I haven't done significant debugging on this yet but if anyone else has
seen this problem and can point me at anything this would be good.
Otherwise I believe we will expect to debug and try to fix this problem
soon.

After upgrading to PostgreSQL 10.5 from 10.2, we went from one type of
query having about one recovery conflict per month to about three or so
every two days.  The queries in question should usually complete very fast
and should never hit the 30 sec max standby delay.

At present I believe this to likely be a regression.  But if nobody else
knows otherwise, I should know more in a couple days.

-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


[PATCH] Fix for infinite signal loop in parallel scan

2018-09-07 Thread Chris Travers
Hi;

Attached is the patch we are fully testing at Adjust.  There are a few
non-obvious aspects of the code around where the patch hits.I have run
make check on Linux and MacOS, and make check-world on Linux (check-world
fails on MacOS on all versions and all branches due to ecpg failures).
Automatic tests are difficult because it is to a rare race condition which
is difficult (or possibly impossible) to automatically create.  Our current
approach testing is to force the issue using a debugger.  Any ideas on how
to reproduce automatically are appreciated but even on our heavily loaded
systems this seems to be a very small portion of queries that hit this case
(meaning the issue happens about once a week for us).

The main problem is that under certain rare circumstances we see recovery
conflicts going into loops sending sigusr1 to the parallel query which
retries a posix_falloc call.  The call gets interrupted by the signal
perpetually and the replication cannot proceed.

The patch attempts to handle cases where we are trying to cancel the query
or terminate the process in the same way we would handle an ENOSPC in the
resize operation, namely to break off the loop, do relevant cleanup, and
then throw relevant exceptions.

There is a very serious complication in this area, however, which is
that dsm_impl_op
takes an elevel parameter which determines whether or not it is safe to
throw errors.  This elevel is then passed to ereport inside the function,
and this function also calls the resize operation itself.  Since this is
not safe with CHECK_FOR_INTERRUPTS(), I conditionally do this only if
elevel is greater or equal to ERROR.

Currently all codepaths here use elevel ERROR when reaching this path but
given that the calling function supports semantics where this could be
lower, and where a caller might have additional cleanup to do, I don't
think one can just add CHECK_FOR_INTERRUPTS() there even though that would
work for now since this could create all kinds of subtle bugs in the future.

So what the patch does is check for whether we are trying to end the query
or the backend and does not retry the resize operation if either of those
conditions are true.  In those cases it will set errno to EINTR and return
the same.

The only caller then, if the resize operation failed, checks for an elevel
greater or equal to ERROR, and whether the errno is set to EINTR.  If so it
checks for signals.  If these do not abort the query, we then fall through
and pass the ereport with the supplied elevel as we would have otherwise,
and return false to the caller.

In current calls to this function, this means that interrupts are handled
right after cleanup of the dynamic shared memory and these then abort the
query or exit the backend.  Future calls could specify a WARNING elevel if
they have extra cleanup to do, in which case signals would not be checked,
and the same warning found today would found in the log.  At the next
CHECK_FOR_INTERRUPTS() call, the appropriate errors would be raised etc.

In the past it had been suggested we do PG_TRY(); and PG_CATCH(), but given
that it is not consistent whether we can raise an error or whether we MUST
raise an error, I don't see how this approach can work.  As far as I can
see, we MUST raise an error in the appropriate spot if and only if elevel
is set to a sufficient level.

Backporting this is pretty trivial.  We expect to confirm this ourselves on
both master and 10.  I can prepare back ports fairly quickly.

Is there any feedback on this approach before I add it to the next
commitfest?

-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


racecond.patch
Description: Binary data


Re: Funny hang on PostgreSQL 10 during parallel index scan on slave

2018-09-06 Thread Chris Travers
On Thu, Sep 6, 2018 at 1:31 PM Chris Travers 
wrote:

>
>
> On Thu, Sep 6, 2018 at 11:08 AM Chris Travers 
> wrote:
>
>> Ok, so here's my current patch (work in progress).  I have not run tests
>> yet, and finding a way to add a test case is virtually impossible though I
>> expect we will find ways of using gdb to confirm local fix later.
>>
>> After careful code review, I settled on the following approach which
>> seemed to be the least intrusive.
>>
>> 1.  Change the retry logic so that it does not retry of
>> QueryCancelPending or ProcDiePending are set.  In these cases EINTR is
>> passed back to the caller
>> 2.  Change the error handling logic of the caller so that EINTR is
>> handled by the next CHECK_FOR_INTERRUPTS() after cleanup.
>>
>> This means that the file descriptor is now already closed and that we
>> handle this the same way we would with a ENOSPC.  The reason for this is
>> that there are many places in the code where it is not clear what degree of
>> safety is present for throwing errors using ereport, and the patch needs to
>> be as unobtrusive as possible to facilitate back porting.
>>
>> At this point the patch is for discussion.  I have not even compiled it
>> or tested it but maybe there is something wrong with my approach so figured
>> I would send it out early.
>>
>> The major assumptions are:
>> 1.  close() will never take longer than the interrupt interval that
>> caused the loop to break.
>> 2.  close() does not get interrupted in a way that will not cause cleanup
>> problems later.
>> 3. We will reach the next interrupt check at a reasonable interval and
>> safe spot.
>>
>> Any initial arguments against?
>>
>
> The previous patch had two issues found on internal code review here.  I
> am sending a new patch.
>
> This patch compiles.  make check passes
> make check-world fails but the errors are the same as found on master and
> involve ecpg.
>
> We will be doing further internal review and verification and then will
> run through pg_indent before final submission.
>
> Changes in this patch:
> 1.  Previous patch had backwards check for EINTR in calling function.
> This was fixed.
> 2.  In cases where ERROR elevel was passed in, function would return
> instead of error out on case of error.
>
> So in this case we check if we expect to error on error and if so, check
> for interrupts.  If not, we go through the normal error handling/logging
> path which might result in an warning that shared memory segment could not
> be allocated followed by an actual meaningful error.  I could put that in
> an else if, if that is seen as a good idea, but figured I would raise it as
> a discussion point.
>

Attaching correct patch

>
>
>>
>> --
>> Best Regards,
>> Chris Travers
>> Head of Database
>>
>> Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
>> Saarbrücker Straße 37a, 10405 Berlin
>>
>>
>
> --
> Best Regards,
> Chris Travers
> Head of Database
>
> Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
> Saarbrücker Straße 37a, 10405 Berlin
>
>

-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


racecondition.patch
Description: Binary data


Re: Funny hang on PostgreSQL 10 during parallel index scan on slave

2018-09-06 Thread Chris Travers
On Thu, Sep 6, 2018 at 11:08 AM Chris Travers 
wrote:

> Ok, so here's my current patch (work in progress).  I have not run tests
> yet, and finding a way to add a test case is virtually impossible though I
> expect we will find ways of using gdb to confirm local fix later.
>
> After careful code review, I settled on the following approach which
> seemed to be the least intrusive.
>
> 1.  Change the retry logic so that it does not retry of QueryCancelPending
> or ProcDiePending are set.  In these cases EINTR is passed back to the
> caller
> 2.  Change the error handling logic of the caller so that EINTR is handled
> by the next CHECK_FOR_INTERRUPTS() after cleanup.
>
> This means that the file descriptor is now already closed and that we
> handle this the same way we would with a ENOSPC.  The reason for this is
> that there are many places in the code where it is not clear what degree of
> safety is present for throwing errors using ereport, and the patch needs to
> be as unobtrusive as possible to facilitate back porting.
>
> At this point the patch is for discussion.  I have not even compiled it or
> tested it but maybe there is something wrong with my approach so figured I
> would send it out early.
>
> The major assumptions are:
> 1.  close() will never take longer than the interrupt interval that caused
> the loop to break.
> 2.  close() does not get interrupted in a way that will not cause cleanup
> problems later.
> 3. We will reach the next interrupt check at a reasonable interval and
> safe spot.
>
> Any initial arguments against?
>

The previous patch had two issues found on internal code review here.  I am
sending a new patch.

This patch compiles.  make check passes
make check-world fails but the errors are the same as found on master and
involve ecpg.

We will be doing further internal review and verification and then will run
through pg_indent before final submission.

Changes in this patch:
1.  Previous patch had backwards check for EINTR in calling function.  This
was fixed.
2.  In cases where ERROR elevel was passed in, function would return
instead of error out on case of error.

So in this case we check if we expect to error on error and if so, check
for interrupts.  If not, we go through the normal error handling/logging
path which might result in an warning that shared memory segment could not
be allocated followed by an actual meaningful error.  I could put that in
an else if, if that is seen as a good idea, but figured I would raise it as
a discussion point.


>
> --
> Best Regards,
> Chris Travers
> Head of Database
>
> Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
> Saarbrücker Straße 37a, 10405 Berlin
>
>

-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


racecondition.patch
Description: Binary data


Re: Funny hang on PostgreSQL 10 during parallel index scan on slave

2018-09-06 Thread Chris Travers
Ok, so here's my current patch (work in progress).  I have not run tests
yet, and finding a way to add a test case is virtually impossible though I
expect we will find ways of using gdb to confirm local fix later.

After careful code review, I settled on the following approach which seemed
to be the least intrusive.

1.  Change the retry logic so that it does not retry of QueryCancelPending
or ProcDiePending are set.  In these cases EINTR is passed back to the
caller
2.  Change the error handling logic of the caller so that EINTR is handled
by the next CHECK_FOR_INTERRUPTS() after cleanup.

This means that the file descriptor is now already closed and that we
handle this the same way we would with a ENOSPC.  The reason for this is
that there are many places in the code where it is not clear what degree of
safety is present for throwing errors using ereport, and the patch needs to
be as unobtrusive as possible to facilitate back porting.

At this point the patch is for discussion.  I have not even compiled it or
tested it but maybe there is something wrong with my approach so figured I
would send it out early.

The major assumptions are:
1.  close() will never take longer than the interrupt interval that caused
the loop to break.
2.  close() does not get interrupted in a way that will not cause cleanup
problems later.
3. We will reach the next interrupt check at a reasonable interval and safe
spot.

Any initial arguments against?

-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


racecondition.patch
Description: Binary data


Re: Funny hang on PostgreSQL 10 during parallel index scan on slave

2018-09-05 Thread Chris Travers
On Wed, Sep 5, 2018 at 6:55 PM Andres Freund  wrote:

> Hi,
>
> On 2018-09-05 18:48:44 +0200, Chris Travers wrote:
> > Will submit a patch here shortly.  Thanks!  Should we do for master and
> > 10?  Or 9.6 too?
>
> Please don't top-post on this list.  This needs to be done in all
> branches where the posix_fallocate call is present.
>
> > > Yep,  Maybe we should check for signals there.
> > >
> > > On Wed, Sep 5, 2018 at 5:27 PM Thomas Munro <
> thomas.mu...@enterprisedb.com>
> > > wrote:
> > >
> > >> On Wed, Sep 5, 2018 at 8:23 AM Chris Travers <
> chris.trav...@adjust.com>
> > >> wrote:
> > >> > 1.  The query is in a parallel index scan or similar
> > >> > 2.  A process is executing a parallel plan and allocating a
> significant
> > >> chunk of memory (2MB for example) in dynamic shared memory.
> > >> > 3.  The startup process goes into a loop where it sends a sigusr1,
> > >> sleeps 5m, and sends another sigusr1 etc.
> > >> > 4.  The sigusr1 aborts the system call, which is then retried.
> > >> > 5.  Because the system call takes more than 5ms, we end up in an
> > >> endless loop
>
> What you're presumably encountering here is a recovery conflict.
>

Agreed but the question is how to correct what is a fairly interesting race
condition.

>
>
> > On Wed, Sep 5, 2018 at 6:40 PM Chris Travers 
> > wrote:
> > >> Do you mean this loop in dsm_impl_posix_resize() is getting
> > >> interrupted constantly and never completing?
> > >>
> > >> /* We may get interrupted, if so just retry. */
> > >> do
> > >>     {
> > >> rc = posix_fallocate(fd, 0, size);
> > >> } while (rc == EINTR);
> > >>
>
> Probably worthwile to check that the dsm code is properly robust if
> errors are thrown from within here.
>

Will check that too.  Thanks!

>
>
> Greetings,
>
> Andres Freund
>


-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: Funny hang on PostgreSQL 10 during parallel index scan on slave

2018-09-05 Thread Chris Travers
Will submit a patch here shortly.  Thanks!  Should we do for master and
10?  Or 9.6 too?

On Wed, Sep 5, 2018 at 6:40 PM Chris Travers 
wrote:

> Yep,  Maybe we should check for signals there.
>
> On Wed, Sep 5, 2018 at 5:27 PM Thomas Munro 
> wrote:
>
>> On Wed, Sep 5, 2018 at 8:23 AM Chris Travers 
>> wrote:
>> > 1.  The query is in a parallel index scan or similar
>> > 2.  A process is executing a parallel plan and allocating a significant
>> chunk of memory (2MB for example) in dynamic shared memory.
>> > 3.  The startup process goes into a loop where it sends a sigusr1,
>> sleeps 5m, and sends another sigusr1 etc.
>> > 4.  The sigusr1 aborts the system call, which is then retried.
>> > 5.  Because the system call takes more than 5ms, we end up in an
>> endless loop
>>
>> Do you mean this loop in dsm_impl_posix_resize() is getting
>> interrupted constantly and never completing?
>>
>> /* We may get interrupted, if so just retry. */
>> do
>> {
>> rc = posix_fallocate(fd, 0, size);
>> } while (rc == EINTR);
>>
>> --
>> Thomas Munro
>> http://www.enterprisedb.com
>>
>
>
> --
> Best Regards,
> Chris Travers
> Head of Database
>
> Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
> Saarbrücker Straße 37a, 10405 Berlin
>
>

-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: Funny hang on PostgreSQL 10 during parallel index scan on slave

2018-09-05 Thread Chris Travers
Yep,  Maybe we should check for signals there.

On Wed, Sep 5, 2018 at 5:27 PM Thomas Munro 
wrote:

> On Wed, Sep 5, 2018 at 8:23 AM Chris Travers 
> wrote:
> > 1.  The query is in a parallel index scan or similar
> > 2.  A process is executing a parallel plan and allocating a significant
> chunk of memory (2MB for example) in dynamic shared memory.
> > 3.  The startup process goes into a loop where it sends a sigusr1,
> sleeps 5m, and sends another sigusr1 etc.
> > 4.  The sigusr1 aborts the system call, which is then retried.
> > 5.  Because the system call takes more than 5ms, we end up in an endless
> loop
>
> Do you mean this loop in dsm_impl_posix_resize() is getting
> interrupted constantly and never completing?
>
> /* We may get interrupted, if so just retry. */
> do
> {
> rc = posix_fallocate(fd, 0, size);
> } while (rc == EINTR);
>
> --
> Thomas Munro
> http://www.enterprisedb.com
>


-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Funny hang on PostgreSQL 10 during parallel index scan on slave

2018-09-05 Thread Chris Travers
Hi all;

For the last few months we have been facing a funny problem on a slave
where queries go to 100% cpu usage and never finish, causing the recovery
process to hang and the replica to fall behind,  Over time we ruled out a
lot of causes and were banging our heads against this one.  Today we got a
break in it when we attached a debugger to various processes even without
debugging symbols.  Not only did we get useful stack traces from the hung
query but attaching a debugger to the startup process caused the query to
finish.  This has shown up in 10.2 and 10.5.

Based on the stack traces we have concluded the following seems to happen:

1.  The query is in a parallel index scan or similar
2.  A process is executing a parallel plan and allocating a significant
chunk of memory (2MB for example) in dynamic shared memory.
3.  The startup process goes into a loop where it sends a sigusr1, sleeps
5m, and sends another sigusr1 etc.
4.  The sigusr1 aborts the system call, which is then retried.
5.  Because the system call takes more than 5ms, we end up in an endless
loop

I see one of two possible solutions here.
1.  Exponential backoff in sending signals to maybe 1s max.
2.  If there is any way to check for signals before retrying the system
call (which I am not 100% sure where it is yet but on my way).

Any feedback or thoughts before we look at implementing a patch?
-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: Two proposed modifications to the PostgreSQL FDW

2018-08-22 Thread Chris Travers
On Wed, Aug 22, 2018 at 9:09 AM Masahiko Sawada 
wrote:

> On Wed, Aug 22, 2018 at 1:20 PM Chris Travers 
> wrote:
> > The two things I would suggest is that rather than auto-detecting (if I
> understand your patch correctly) whether prepared transactions are possible
> on the other system, making it  an option to the foreign server or foreign
> table.  Otherwise one might enable prepared transactions for one set of
> operations on one database and have it automatically cause headaches in
> another context.
>
> Yeah, currently it's an option for foreign servers. The patch adds a
> new option "two_phase_commit" to postgres_fdw.
>

Seems sane.

>
> >
> > The other thing I wasn't quite sure about on your patch was what happens
> if, say, someone trips over a power cord while the background worker is
> trying to commit things, whether the information is available on the
> initiating server when it comes back. whether a DBA has to go out and try
> to figure out what needs to be committed remotely, and how this would be
> done.  If you could explain that process, that would be helpful to me.
> >
> > (In my approach these would be recorded on the master and an SQL
> function could re-initiate the background worker.)
>
> My approach is almost the same as yours. For details, in the
> pre-commit we do WAL-logging for each participants server before
> preparing transactions on the remote sides. The WAL has information of
> participants foreign servers(foreign server oid, database oid etc) and
> its global transaction identifier. Even if plug-pulled during trying
> to commit we can recover the global transactions that are not
> completed yet and its participants information from WAL. After the
> recovery users needs to execute the SQL function to fix the
> in-completed global transactions. Since the function can find out
> whether the remote transaction should be committed or rollback-ed by
> checking CLOG. Does my answer make sense?
>

Yeah.  That is probably more elegant than my solution.  I do wonder though
if the next phase would not be to add some sort of hook to automatically
start the background worker in this case.

>
> >>
> >>
> >> >
> >> > Moreover since COMMIT PREPARED occurs during the commit hook, not the
> precommit hook, it is too late to roll back the local transaction.  We
> cannot raise errors since this causes a conflict in the commit status of
> the local transaction.  So when we commit the local transaction we commit
> to committing all prepared transactions as soon as possible.  Note some
> changes need to be made to make this usable in the FDW context, so what I
> am hoping is that the dialog helps impact the discussion and options going
> forward.
> >> >
> >> >>
> >> >> Also
> >> >> since we don't want to wait for COMMIT PREPARED to complete we need
> to
> >> >> consider that users could cancel the query anytime. To not break the
> >> >> current semantics we cannot raise error during 2nd phase of two-phase
> >> >> commit but it's not realistic because even the palloc() can raise an
> >> >> error.
> >> >
> >> >
> >> > We don't palloc.  All memory used here is on the stack.  I do allow
> for dramatic precondition checks to cause errors but those should never
> happen absent some other programmer doing something dramatically unsafe
> anyway.  For example if you try to double-commit a transaction set.
> >>
> >> Sorry, palloc() is just an example. I'm not sure all FDWs can
> >> implement all callbacks for two-phase commit without codes that could
> >> emit errors.
> >
> >
> > Yeah, but if you are in the commit hook and someone emits an error,
> that's wrong because that then tries to rollback an already committed
> transaction and the backend rightfully panics.  In fact I should probably
> strip out the precondition check errors there and issue  a warning.  It
> might sometimes happen when something goes seriously wrong on a system
> level, but
>
> In my patch since the commit hook is performed by the background
> worker not by the backends it's no problem if someone emits an error
> in the commit hook. After the backend prepared transactions on the all
> remote side, it enqueue itself to the wait queue. The background
> worker gets the global transaction waiting to be completed and commit
> prepared transaction on all remote side. After completed the global
> transaction the background worker dequeue it.
>

Seems sane.  I was firing off one background worker per global transaction
that needed cleanup.  This might be an area to think abou

Re: Two proposed modifications to the PostgreSQL FDW

2018-08-21 Thread Chris Travers
On Wed, Aug 22, 2018 at 3:12 AM Masahiko Sawada 
wrote:

> On Tue, Aug 21, 2018 at 5:36 PM Chris Travers 
> wrote:
> >
> >
> >
> > On Tue, Aug 21, 2018 at 8:42 AM Masahiko Sawada 
> wrote:
> >>
> >> On Tue, Aug 21, 2018 at 1:47 AM Chris Travers 
> wrote:
> >> >
> >> >
> >> >
> >> > On Mon, Aug 20, 2018 at 4:41 PM Andres Freund 
> wrote:
> >> >>
> >> >> Hi,
> >> >>
> >> >> On 2018-08-20 16:28:01 +0200, Chris Travers wrote:
> >> >> > 2.  TWOPHASECOMMIT=[off|on] option
> >> >>
> >> >> > The second major issue that I see with PostgreSQL's foreign
> database
> >> >> > wrappers is the fact that there is no two phase commit which means
> that a
> >> >> > single transaction writing to a group of tables has no expectation
> that all
> >> >> > backends will commit or rollback together.  With this patch an
> option would
> >> >> > be applied to foreign tables such that they could be set to use
> two phase
> >> >> > commit  When this is done, the first write to each backend would
> register a
> >> >> > connection with a global transaction handler and a pre-commit and
> commit
> >> >> > hooks would be set up to properly process these.
> >> >> >
> >> >> > On recommit a per-global-transaction file would be opened in the
> data
> >> >> > directory and prepare statements logged to the file.  On error, we
> simply
> >> >> > roll back our local transaction.
> >> >> >
> >> >> > On commit hook , we go through and start to commit the remote
> global
> >> >> > transactions.  At this point we make a best effort but track
> whether or not
> >> >> > we were successfully on all.  If successful on all, we delete the
> file.  If
> >> >> > unsuccessful we fire a background worker which re-reads the file
> and is
> >> >> > responsible for cleanup.  If global transactions persist, a SQL
> >> >> > administration function will be made available to restart the
> cleanup
> >> >> > process.  On rollback, we do like commit but we roll back all
> transactions
> >> >> > in the set.  The file has enough information to determine whether
> we should
> >> >> > be committing or rolling back on cleanup.
> >> >> >
> >> >> > I would like to push these both for Pg 12.  Is there any feedback
> on the
> >> >> > concepts and the problems first
> >> >>
> >>
> >> Thank you for the proposal. I agree that it's a major problem that
> >> postgres_fdw (or PostgreSQL core API) doesn't support two-phase
> >> commit.
> >>
> >> >> There's been *substantial* work on this. You should at least read the
> >> >> discussion & coordinate with the relevant developers.
> >> >
> >> >
> >> > I suppose I should forward this to them directly also.
> >> >
> >> > Yeah.   Also the transaction manager code for this I wrote while
> helping with a proof of concept for this copy-to-remote extension.
> >> >
> >> > There are a few big differences in implementation with the patches
> you mention and the disagreement was part of why I thought about going this
> direction.
> >> >
> >> > First, discussion of differences in implementation:
> >> >
> >> > 1.  I treat the local and remote transactions symmetrically and I
> make no assumptions about what might happen between prepare and an
> attempted local commit.
> >> >prepare goes into the precommit hook
> >> >commit goes into the commit hook and we never raise errors if it
> fails (because you cannot rollback at that point).  Instead a warning is
> raised and cleanup commences.
> >> >rollback goes into the rollback hook and we never raise errors if
> it fails (because you are already rolling back).
> >> >
> >> > 2.  By treating this as a property of a table rather than a property
> of a foreign data wrapper or a server, we can better prevent prepared
> transactions where they have not been enabled.
> >> >This also ensures that we know whether we are guaranteeing two
> phase commit or not by looking at the table.
> >> >
> >> > 3.  By making this opt-in it avoids a lot of problems with regard

Re: Two proposed modifications to the PostgreSQL FDW

2018-08-21 Thread Chris Travers
On Mon, Aug 20, 2018 at 4:44 PM Tom Lane  wrote:

> Chris Travers  writes:
> > I am looking at trying to make two modifications to the PostgreSQL FDW
> and
> > would like feedback on this before I do.
>
> > 1.  INSERTMETHOD=[insert|copy] option on foreign table.
>
> > One significant limitation of the PostgreSQL FDW is that it does a
> prepared
> > statement insert on each row written which imposes a per-row latency.
> This
> > hits environments where there is significant latency or few latency
> > guarantees particularly hard, for example, writing to a foreign table
> that
> > might be physically located on another continent.  The idea is that
> > INSERTMETHOD would default to insert and therefore have no changes but
> > where needed people could specify COPY which would stream the data out.
> > Updates would still be unaffected.
>
> It seems unlikely to me that an FDW option would be at all convenient
> for this.  What about selecting it dynamically based on the planner's
> estimate of the number of rows to be inserted?
>
> A different thing we could think about is enabling COPY TO/FROM a
> foreign table.
>

Actually as I start to understand some aspects Andres's concern above,
there are issues beyond numbers of rows.  But yes, selecting dynamically
would be preferable.

Two major things I think we cannot support on this are RETURNING clauses
and ON CONFLICT clauses.  So anywhere we need to worry about those a copy
node could not be used.  So it is more complex than merely row estimates.

>
> > 2.  TWOPHASECOMMIT=[off|on] option
>
> > The second major issue that I see with PostgreSQL's foreign database
> > wrappers is the fact that there is no two phase commit which means that a
> > single transaction writing to a group of tables has no expectation that
> all
> > backends will commit or rollback together.  With this patch an option
> would
> > be applied to foreign tables such that they could be set to use two phase
> > commit  When this is done, the first write to each backend would
> register a
> > connection with a global transaction handler and a pre-commit and commit
> > hooks would be set up to properly process these.
>
> ENOINFRASTRUCTURE ... and the FDW pieces of that hardly seem like the
> place to start.
>

I disagree about the lack of infrastructure.  We have every piece of
infrastructure we need for a minimum viable offering.
1.  Two Phase Commit in PostgreSQL
2.  Custom Background Workers
3.  Pre/Post Commit/Rollback hooks for callbacks.

Those are sufficient to handle the vast majority of error cases.

The one thing we *might* want that we don't have is a startup process to
scan a directory of background worker status files and fire off appropriate
background workers on database start.  That hardly seems difficult though.


>
> regards, tom lane
>


-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: Two proposed modifications to the PostgreSQL FDW

2018-08-21 Thread Chris Travers
On Tue, Aug 21, 2018 at 8:42 AM Masahiko Sawada 
wrote:

> On Tue, Aug 21, 2018 at 1:47 AM Chris Travers 
> wrote:
> >
> >
> >
> > On Mon, Aug 20, 2018 at 4:41 PM Andres Freund 
> wrote:
> >>
> >> Hi,
> >>
> >> On 2018-08-20 16:28:01 +0200, Chris Travers wrote:
> >> > 2.  TWOPHASECOMMIT=[off|on] option
> >>
> >> > The second major issue that I see with PostgreSQL's foreign database
> >> > wrappers is the fact that there is no two phase commit which means
> that a
> >> > single transaction writing to a group of tables has no expectation
> that all
> >> > backends will commit or rollback together.  With this patch an option
> would
> >> > be applied to foreign tables such that they could be set to use two
> phase
> >> > commit  When this is done, the first write to each backend would
> register a
> >> > connection with a global transaction handler and a pre-commit and
> commit
> >> > hooks would be set up to properly process these.
> >> >
> >> > On recommit a per-global-transaction file would be opened in the data
> >> > directory and prepare statements logged to the file.  On error, we
> simply
> >> > roll back our local transaction.
> >> >
> >> > On commit hook , we go through and start to commit the remote global
> >> > transactions.  At this point we make a best effort but track whether
> or not
> >> > we were successfully on all.  If successful on all, we delete the
> file.  If
> >> > unsuccessful we fire a background worker which re-reads the file and
> is
> >> > responsible for cleanup.  If global transactions persist, a SQL
> >> > administration function will be made available to restart the cleanup
> >> > process.  On rollback, we do like commit but we roll back all
> transactions
> >> > in the set.  The file has enough information to determine whether we
> should
> >> > be committing or rolling back on cleanup.
> >> >
> >> > I would like to push these both for Pg 12.  Is there any feedback on
> the
> >> > concepts and the problems first
> >>
>
> Thank you for the proposal. I agree that it's a major problem that
> postgres_fdw (or PostgreSQL core API) doesn't support two-phase
> commit.
>
> >> There's been *substantial* work on this. You should at least read the
> >> discussion & coordinate with the relevant developers.
> >
> >
> > I suppose I should forward this to them directly also.
> >
> > Yeah.   Also the transaction manager code for this I wrote while helping
> with a proof of concept for this copy-to-remote extension.
> >
> > There are a few big differences in implementation with the patches you
> mention and the disagreement was part of why I thought about going this
> direction.
> >
> > First, discussion of differences in implementation:
> >
> > 1.  I treat the local and remote transactions symmetrically and I make
> no assumptions about what might happen between prepare and an attempted
> local commit.
> >prepare goes into the precommit hook
> >commit goes into the commit hook and we never raise errors if it
> fails (because you cannot rollback at that point).  Instead a warning is
> raised and cleanup commences.
> >rollback goes into the rollback hook and we never raise errors if it
> fails (because you are already rolling back).
> >
> > 2.  By treating this as a property of a table rather than a property of
> a foreign data wrapper or a server, we can better prevent prepared
> transactions where they have not been enabled.
> >This also ensures that we know whether we are guaranteeing two phase
> commit or not by looking at the table.
> >
> > 3.  By making this opt-in it avoids a lot of problems with regards to
> incorrect configuration etc since if the DBA says "use two phase commit"
> and failed to enable prepared transactions on the other side...
> >
> > On to failure modes:
> >  1.  Its possible that under high load too many foreign transactions are
> prepared and things start rolling back instead of committing.  Oh well
> >  2.  In the event that a foreign server goes away between prepare and
> commit, we continue to retry via the background worker.  The background
> worker is very pessimistic and checks every remote system for the named
> transaction.
>
> If some participant servers fail during COMMIT PREPARED, will the
> client get a "committed"? or an "aborted"? If the

Re: Two proposed modifications to the PostgreSQL FDW

2018-08-20 Thread Chris Travers
For the record, here's the proof of concept code for the transaction
manager which works off libpq connections.

It is not ready yet by any means.  But it is included for design
discussion.  If the previous patch gets in instead, that's fine, but figure
it is worth including here for discussion purposes.

Two things which are currently missing are a) an ability to specify in the
log file where the cleanup routine is located for a background worker and
b) a separation of backend responsibility for restarting cleanup efforts on
server start.

>
>>
>>
>
> --
> Best Regards,
> Chris Travers
> Head of Database
>
> Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
> Saarbrücker Straße 37a, 10405 Berlin
>
>

-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


twophasecommit.h
Description: Binary data


twophasecommit.c
Description: Binary data


Re: Two proposed modifications to the PostgreSQL FDW

2018-08-20 Thread Chris Travers
On Mon, Aug 20, 2018 at 4:41 PM Andres Freund  wrote:

> Hi,
>
> On 2018-08-20 16:28:01 +0200, Chris Travers wrote:
> > 1.  INSERTMETHOD=[insert|copy] option on foreign table.
> >
> > One significant limitation of the PostgreSQL FDW is that it does a
> prepared
> > statement insert on each row written which imposes a per-row latency.
> This
> > hits environments where there is significant latency or few latency
> > guarantees particularly hard, for example, writing to a foreign table
> that
> > might be physically located on another continent.  The idea is that
> > INSERTMETHOD would default to insert and therefore have no changes but
> > where needed people could specify COPY which would stream the data out.
> > Updates would still be unaffected.
>
> That has a *lot* of semantics issues, because you suddenly don't get
> synchronous error reports anymore.  I don't think that's OK on a
> per-table basis.  If we invented something like this, it IMO should be a
> per-statement explicit opt in that'd allow streaming.
>

I want to push back a bit and justify the per table decision here.

This is primarily useful when two things are present:
1.  There is significant lag between the servers
2.  There are significant bulk operations on the table.

Only when both are present does it make sense to use this option.  If you
have a low latency connection, don't use it.  If you are only writing a few
rows here and there, don't use it.  If you have a high latency connection
*and* you are inserting large numbers of rows at a time, then this is where
you run into a problem and the current FDW approach is, frankly, unusable
in these cases.  These characteristics follow the server (and hence the
table) in the first case and the table only in the second case.  Therefore
the characteristics of the foreign table determine whether it makes sense
to consider this option.  Doing this on the statement level poses more
problems, in my view, than it fixes.

Moreover the idea of batching inserts doesn't help either (and actually
would be worse in many cases than the current approach) since the current
approach uses a prepared statement which gets called for each row.Doing
a batch insert might save you on the per-row round trip but it adds a great
deal of complexity to the overall operation.

Now, a per-statement opt-in poses a number of questions.  Obviously we'd
have to allow this on local tables too, but it would have no effect.  So
suddenly you have a semantic construct which *may* or *may not* pose the
problems you mention and in fact may or may not pose the problem on foreign
tables because whether the construct does anything depends on the FDW
driver.  I don't see what exactly one buys in shifting the problem in this
way given that when this is useful it will be used on virtually all inserts
and when it is not useful it won't be used at all.

Now, the entire problem is the expectation of synchronous errors on a per
row basis.  This is what causes the problem with inserting a million rows
over a transcontinental link. (insert row, check error, insert row, check
error etc).

So on to the question of containing the problems.  I think the best way to
contain this is to have one COPY statement per insert planner node.  And
that makes sure that the basic guarantee that a planner node succeeds or
errors is met.  For sequences, I would expect that supplying a column list
 without the relevant sequenced value should behave in a tolerable way
(otherwise two concurrent copies would have problems).

So that's the primary defense.  Now let's look at the main alternatives.

1.  You can use a table where you logically replicate inserts, and where
you insert then truncate or delete.  This works fine until you have enough
WAL traffic that you cannot tolerate a replication outage.
2.  You can write an extension which allows you to COPY a current table
into a foreign table.  The problem here is that SPI has a bunch of
protections to keep you from copying to stdout, so when I helped with this
proof fo concept we copied to a temp file in a ramdisk and copied from
there which is frankly quite a bit worse than this approach.

If you want a per statement opt-in, I guess the questions that come to my
mind are:

1.  What would this opt-in look like?
2.  How important is it that we avoid breaking current FDW interfaces to
implement it?
3.  How would you expect it to behave if directed at a local table or a
foreign table on, say, a MySQL db?

>
>
> > 2.  TWOPHASECOMMIT=[off|on] option
>
> > The second major issue that I see with PostgreSQL's foreign database
> > wrappers is the fact that there is no two phase commit which means that a
> > single transaction writing to a group of tables has no expectation that
> all
> > backends will commit or rollback together.  With this patch an option
> would
> 

Two proposed modifications to the PostgreSQL FDW

2018-08-20 Thread Chris Travers
Hi all;

I am looking at trying to make two modifications to the PostgreSQL FDW and
would like feedback on this before I do.

1.  INSERTMETHOD=[insert|copy] option on foreign table.

One significant limitation of the PostgreSQL FDW is that it does a prepared
statement insert on each row written which imposes a per-row latency.  This
hits environments where there is significant latency or few latency
guarantees particularly hard, for example, writing to a foreign table that
might be physically located on another continent.  The idea is that
INSERTMETHOD would default to insert and therefore have no changes but
where needed people could specify COPY which would stream the data out.
Updates would still be unaffected.

2.  TWOPHASECOMMIT=[off|on] option

The second major issue that I see with PostgreSQL's foreign database
wrappers is the fact that there is no two phase commit which means that a
single transaction writing to a group of tables has no expectation that all
backends will commit or rollback together.  With this patch an option would
be applied to foreign tables such that they could be set to use two phase
commit  When this is done, the first write to each backend would register a
connection with a global transaction handler and a pre-commit and commit
hooks would be set up to properly process these.

On recommit a per-global-transaction file would be opened in the data
directory and prepare statements logged to the file.  On error, we simply
roll back our local transaction.

On commit hook , we go through and start to commit the remote global
transactions.  At this point we make a best effort but track whether or not
we were successfully on all.  If successful on all, we delete the file.  If
unsuccessful we fire a background worker which re-reads the file and is
responsible for cleanup.  If global transactions persist, a SQL
administration function will be made available to restart the cleanup
process.  On rollback, we do like commit but we roll back all transactions
in the set.  The file has enough information to determine whether we should
be committing or rolling back on cleanup.

I would like to push these both for Pg 12.  Is there any feedback on the
concepts and the problems first

-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: How can we submit code patches that implement our (pending) patents?

2018-07-27 Thread Chris Travers
On Fri, Jul 27, 2018, 19:01 Andres Freund  wrote:

> Hi,
>
> On 2018-07-27 11:15:00 -0500, Nico Williams wrote:
> > Even assuming you can't change the PG license, you could still:
> >
> >  - require disclosure in contributions
>
> That really has no upsides, except poison the area.  Either we reject
> the patch and people doing so can reasonably be expected to know about
> the patents, making further contributions by them worse.  Or we accept
> the patch, and the listed patents make the commercial offerings harder,
> further development more dubious, readers can reasonably be concerned
> about being considered do know about the patents in independent
> projects.
>

What about just requiring a patent license that grants all rights necessary
to  fully enjoy the copyright license? That avoids the need to change the
license fomally.  And it would just clarify that we expect that patents do
NOT change the license.  Not that I expect there would be takers

>
> Greetings,
>
> Andres Freund
>


Re: How can we submit code patches that implement our (pending) patents?

2018-07-26 Thread Chris Travers
On Wed, Jul 4, 2018 at 2:28 AM Tsunakawa, Takayuki <
tsunakawa.ta...@jp.fujitsu.com> wrote:

> Hello,
>
> As I asked at the PGCon developer meeting this year, we'd like to offer
> our company's patents and patentapplications license to the
> PostgreSQL community free of charge.  If I heard correctly at that time, we
> could continue this discussion during the unconference, but I missed that
> opportunity (I'm sorry).  So,  please let me continue the consultation
> here.  If some other mailing list is appropriate such as pgsql-core, let me
> know (but I hope open discussion will lead to better and fair ideas and
> conclusion.)
>
>
> There are three ideas.  Is there any effective idea?
>
>
>
> (1)
> Share patents through a patent pool for open source communities.  Our
> company, Fujitsu Limited, is a licensee member of Open Invention Network
> (OIN).  And PostgreSQL is the protection target of OIN as listed
> here:
>
>
> http://www.openinventionnetwork.com/joining-oin/linux-system/linux-system-table/?cat_id=14=table
>
> Google, IBM, Red Hat, Toyota, and other big names are the big sponsors.
> The basic membership is free.
>
>
> (2)
> For each patch we submit to the community that implements our patents,
> include in the mail body something like "3. Grant of Patent License" in
> Apache License 2.0:
>
> Apache License, Version 2.0
> https://www.apache.org/licenses/LICENSE-2.0
>
>
> (3)
> Put up a page on our company web site that's similar to Red Hat's "Patent
> Promise", which is restricted to PostgreSQL.
>
>
> FYI, I've consulted SFLC (Software Freedom Law Center) about this matter,
> and I've just got a reply "We'll be in touch."  I'm waiting for the next
> response.
>

Some fresh thoughts on the top-level matter.

I think the general concern of everyone here is that the PostgreSQL
copyright license is a very broad license.  It allows people to take the
code, build new products possibly unrelated, using the code, and then
commercialize them.  While a patent license that grants everything needed
to make full use of the copyright license is not quite the same as just
handing the patents over to the public domain, it seems likely that this is
close enough to be indistinguishable from the perspective of your business.

For example a competitor of yours could copy the relevant pieces of the
PostgreSQL code, refactor this into a library, and then use it as a
derivative work and this would be entirely within the copyright license.
They could then license that library out, and you could not use your
patents or copyrights to stop them.  As such, a patent license would
probably be very similar from a business perspective to a global public
grant even though the two strike me as something which might not offer
protection in the case of a clean-room implementation.

I certainly wouldn't be opposed to accepting patents where a license was
granted to the PostgreSQL Global Developer Group and the community as a
whole to make full use of the patents in any way covered by the copyright
license of PostgreSQL (i.e where any use that involves utilizing the
copyright license for PostgreSQL extends to the patents in question).  But
I am not sure that a business case would be able to be made for releasing
any patents under such a license since it means that for anyone else, using
the patents even in commercial software becomes trivial and enforcement
would become very difficult indeed.

-- 
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: How can we submit code patches that implement our (pending) patents?

2018-07-26 Thread Chris Travers
On Thu, Jul 26, 2018 at 9:33 AM Oleksandr Shulgin <
oleksandr.shul...@zalando.de> wrote:

> On Wed, Jul 25, 2018 at 8:35 PM Nico Williams 
> wrote:
>
>>
>> What are you proposing anyways?  That every commit come with a patent
>> search?
>>
>
> I would propose that everyone wasting their time and effort to discuss
> this issue here, would rather spend that time working towards putting an
> end to software patents instead.
> That would be a much better use of the time, IMHO.
>
>
What about adding an extra line to the license that indicates that the
copyright owners also give all patent licenses which are both in their
power to grant and also needed for exercise of the copyright license and
require that new code contributions use this license with the extra clause.

Now, no patent owner is likely to accept those terms, but then we can save
our breath for later discussion.

Strictly speaking I also don't think ti is the same thing as a global
public license for the patent, since it would only apply to cases where
there was a copyright license tied to the PostgreSQL source code
contributed by the patent holder.   So you still have taint problems but
probably not to the same extent.

-- 
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Documenting that queries can be run over replication protocol

2018-07-24 Thread Chris Travers
Hi;

In the process of building some tooling based on replication we discovered
that PostgreSQL 10 uses COPY to populate the initial table on logical
replication, see commit 7c4f52409a8c7d85ed169bbbc1f6092274d03920

This is not currently documented.  Attached is a patch which adds that
documentation to the logical replication part of the protocol section.



-- 
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


logical_replication_sql_doc.diff
Description: Binary data


Re: How can we submit code patches that implement our (pending) patents?

2018-07-13 Thread Chris Travers
On Sat, Jul 7, 2018 at 9:01 PM Andres Freund  wrote:

> Hi,
>
> On 2018-07-07 20:51:56 +0200, David Fetter wrote:
> > As to "dual license," that's another legal thicket in which we've been
> > wise not to involve ourselves. "Dual licensing" is generally used to
> > assert proprietary rights followed immediately by a demand for
> > payment. This is a thing we don't want to do, and it's not a thing we
> > should be enabling others to do as part of our project.  If they wish
> > to do that, they're welcome to do it without our imprimatur.
>
> This is pure FUD.  Obviously potential results of dual licensing depends
> on the license chosen. None of what you describe has anything to do with
> potential pieces of dual PG License / Apache 2.0 licensed code in PG, or
> anything similar. You could at any time choose to only use /
> redistribute postgres, including derivatives, under the rights either
> license permits.
>
> I think there's fair arguments to be made that we do not want to go fo
> for dual licensing with apache 2.0. Biggest among them that the current
> situation is the established practice. But let's have the arguments be
> real, not FUD.
>

First, generally, dual licensing is used to assert proprietary rights and
that's actually the issue here (the scope of the patent-holder's rights)
and the fact that it would change the future code use in some not very nice
ways.  The major exception I know of is the Perl situation where you have
this GPL v2/Artistic License release but I am not entirely sure the
historical reasons for this.

The problem here is the scope of a patent right is different here and this
would effectively mean the Apache License is the real license of the
product.

I assume we agree that the PostgreSQL license plus a patent encumbrance
would be the same as the scope of the patent license, not the scope of the
copyright license.

>
> Andres
>
>

-- 
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: utilities to rebuild commit logs from wal

2018-06-24 Thread Chris Travers
On Sun, Jun 24, 2018 at 5:53 AM Bruce Momjian  wrote:

> On Fri, Jun 22, 2018 at 10:49:58AM +0200, Chris Travers wrote:
> > Before we reinvent the wheel here, does anyone know of utilities to build
> > commit logs from wal segments?  Or even to just fill with commits?
> >
> > I figure it is worth asking before I write one.
>
> Uh, what are commit log?  pg_waldump output?
>

I was missing pg_xact segments and wanting to rebuild them following a
major filesystem crash.

I ended up just writing copying in "all commits" because rollbacks were
very rare on these dbs.  I am still curious if there is an easy way if one
is missing a late segment to generate from WAL if enough WALs are stored
and the WAL level is high enough.

It's not the end of the world but this has helped me to get data about 10
hours more recent than latest backup.  So  so far so good.

>
> --
>   Bruce Momjian  http://momjian.us
>   EnterpriseDB http://enterprisedb.com
>
> + As you are, so once was I.  As I am, so you will be. +
> +      Ancient Roman grave inscription +
>


-- 
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


  1   2   >