Re: COPY table FROM STDIN via SPI

2023-07-11 Thread James Sewell
>
> No.  It'd be a wire protocol break, and even if it weren't I would not
> expect many clients to be able to deal with it.  They're in the middle
> of a query cycle (for the SELECT or CALL that got you into SPI), and
> suddenly the backend asks for COPY data?  What are they supposed to
> send, or where are they supposed to put it for the COPY-out case?
> There's just not provision for nesting protocol operations like that.
>

What about running a COPY directly from C - is that possible?


COPY table FROM STDIN via SPI

2023-07-11 Thread James Sewell
Is $subject possible?

 I feel like maybe the answer is no, but then I can also see some backend
code for similar things in copy.h.

Perhaps it’s possible via a function call not sending the SQL?

- James


Virtual tx id

2022-09-20 Thread James Sewell
Hello Hackers!

Is it possible to get the current virtual txid from C somehow?

I've looked through the code, but can't seem to find anything other than
getting a NULL when there is no (real) xid assigned. Maybe I'm missing
something?

Cheers,
James


Re: Threading in BGWorkers (!)

2020-07-30 Thread James Sewell
>
> I see no good reason to believe that the signal handler issue is the only
> one.  Even if it is,
> not being able to call any postgres infrastructure is a pretty huge
> handicap.
>

(changed emails to get rid of the nasty employer notice...)

It's at least a workable handicap that I'm happy to deal with.

I can say with 100% confidence that people coming from non C languages will
be using threading in Postgres backends as interop matures (and it's
happening fast now). A lot of the time they won't even know they are using
threads as it will be done by libraries they make use of transparently.

Let's help them to avoid unsafe code now, not wait until they show up on
this list with a critical failure and tap at the big sign that says "NO
THREADING".

- james


Re: Threading in BGWorkers (!)

2020-07-28 Thread James Sewell
> We need some kind of concrete plan for what is a
> usable amount of functionality and what has to be done to get it.
>

This is exactly the type of discussion I'm after.

I'll start.

A usable amount of functionality would be the ability to start threads from:

   - a background worker

These cases should be bound by *at least* the following rules:

   - no signal handling from threads
   - no calls into PostgreSQL functions from threads


The patch I supplied is one of the requirements to get there - I would love
help to discover the others.

-- 
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.


Re: Threading in BGWorkers (!)

2020-07-02 Thread James Sewell
Hackers,

In the hope of this not being derailed by larger/more unpopular pieces of
work, I'm attaching a tiny patch which I don't believe will have any
negative impact - but will remove one blocker for $subject (sigprocmask
usage is "unspecified" in multithreaded code [1]).

The patch replaces *sigprocmask *with *pthread_sigmask*. They have
identical APIs ("[pthread_sigmask] shall be equivalent to sigprocmask(),
without the restriction that the call be made in a single-threaded
process"[1])

The rationale here is that as far as I can tell this is the **only**
blocker to using multithreaded code in a BGWorker which can't be avoided by
adhering to strict code rules (eg: no PG calls from non-main threads, no
interaction with signals from non-main threads).

Before this went in the rules would need to be agreed upon and documented -
but hopefully it's at least a way forward / a way to progress this
discussion.

Cheers,
James

[1]
https://pubs.opengroup.org/onlinepubs/9699919799/functions/sigprocmask.html

-- 
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.


pthread_sigmask.patch
Description: Binary data


Re: Threading in BGWorkers (!)

2020-06-23 Thread James Sewell
On Tue, 23 Jun 2020 at 17:26, Konstantin Knizhnik 
wrote:

> On 23.06.2020 10:15, James Sewell wrote:
>
> Using multithreading in bgworker is possible if you do not use any
>> Postgres runtime inside thread procedures or do it in exclusive critical
>> section.
>> It is not so convenient but possible. The most difficult thing from my
>> point of view is error reporting.
>>
>
> Happy to be proved wrong, but I don't think this is correct.
>
> PostgreSQL can call sigprocmask() in your BGWorker whenever it wants, and
> "The use of sigprocmask() is unspecified in a multithreaded process" [1]
>
>
> Sorry, may be I missed something.
> But in my bgworker I am not using Postgres runtime at all (except initial
> bgworker startup code).
> So I am not using latches (which are based on signals), snapshots,...
> In my case bgworker has no connection to Postgres at all.
> Yes, it can still receives signals from Postmaster (SIGTERM, SIGHUP). But
> their handler are trivial and do not need to mask any signals.
>
> So may be in general case combination of signals and threads may cause
> some problems,
> but it doesn't mean that you can't create multithreaded bgworker.
>

Ah yes - sorry *I* missed something.

A multi threaded BGWorker which accesses shared memory and database via
SPI.

-- 
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.


Re: Threading in BGWorkers (!)

2020-06-23 Thread James Sewell
On Tue, 23 Jun 2020 at 17:15, James Sewell 
wrote:

> Using multithreading in bgworker is possible if you do not use any
>> Postgres runtime inside thread procedures or do it in exclusive critical
>> section.
>> It is not so convenient but possible. The most difficult thing from my
>> point of view is error reporting.
>>
>
> Happy to be proved wrong, but I don't think this is correct.
>
> PostgreSQL can call sigprocmask() in your BGWorker whenever it wants, and
> "The use of sigprocmask() is unspecified in a multithreaded process" [1]
>
> [1] https://pubs.opengroup.org/onlinepubs/9699919799/
>

Sorry link should be [1]
https://pubs.opengroup.org/onlinepubs/9699919799/functions/sigprocmask.html

-- 
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.


Re: Threading in BGWorkers (!)

2020-06-23 Thread James Sewell
> Using multithreading in bgworker is possible if you do not use any
> Postgres runtime inside thread procedures or do it in exclusive critical
> section.
> It is not so convenient but possible. The most difficult thing from my
> point of view is error reporting.
>

Happy to be proved wrong, but I don't think this is correct.

PostgreSQL can call sigprocmask() in your BGWorker whenever it wants, and
"The use of sigprocmask() is unspecified in a multithreaded process" [1]

[1] https://pubs.opengroup.org/onlinepubs/9699919799/

-- 
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.


Re: Threading in BGWorkers (!)

2020-06-23 Thread James Sewell
On Tue, 23 Jun 2020 at 13:38, Tom Lane  wrote:

> James Sewell  writes:
> > I was talking about PostgreSQL and threading on IRC the other day -
> which I
> > know is a frowned upon topic - and just wanted to frame the same
> questions
> > here and hopefully get a discussion going.
>
> I think the short answer about threading in bgworkers (or any other
> backend process) is "we don't support it; if you try it and it breaks,
> which it likely will, you get to keep both pieces".


I'm hoping that from this a set of rules rather than a blanket ban can be
agreed upon.


> I'm not sure that
> there's any merit in making small dents in that policy.  I suspect that
> at some point, somebody will try to move those goalposts a long way,
> but it will be a large and controversial patch.
>

Understood, and I do agree with keeping the policy simple - but it looks
like (potentially) the only blocker is a one line change to swap
out sigprocmask. From my perspective this is a very large win - I'll do
some testing.

Why do you want threads in a bgworker anyway?  You could spawn multiple
> bgworkers, or you could dispatch the threaded work to a non-Postgres-ish
> process as PL/Java does.  The only advantage I can see of doing work in a
> process that's not at arm's-length is to have access to PG computational
> or IPC facilities, and none of that is likely to work safely in a threaded
> context.
>

I'm writing the workers in Rust - it would be nice to be able to safely
access Rust crates which make use of threads.

cheers,
James

-- 
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.


Threading in BGWorkers (!)

2020-06-22 Thread James Sewell
Hi hackers,

I was talking about PostgreSQL and threading on IRC the other day - which I
know is a frowned upon topic - and just wanted to frame the same questions
here and hopefully get a discussion going.

On IRC RhodiumToad offered the following advice (after a standard there be
dragons here disclaimer, as well as noting this may not be a complete list):

Threading (may) be safe if:

   1. all signals will be delivered on the main thread and nowhere else
   2. no postgres function will ever be called from anything that's not the
   main thread
   3. it's safe for postgres to call any system library function, even ones
   explicitly marked as not thread safe
   4. it's safe for postgres to call sigprocmask()

I can live with 1. and 2 - they are fairly easy as long as you know the
rules.

3. needs to be converted to a list of possible calls - which can be done
and checked, I suppose against the POSIX standards?

4. is not fine (I suppose this is a specific example of 3.), as I think
Postgres would need to be using  pthread_sigmask() instead - given looks
like a one line change could  pthread_sigmask be used when available?

Are there any other rules which need to be adhered to?

Any thoughts, comments, dire warnings, hand waving? On IRC the general
thought was that any changes could be seen as encouraging threading which
is a bad thing - I would argue if you're writing BGWorkers which have SPI
access you've already got a pretty large area to screw things up in anyway
(if you aren't following the standards / code comments).

James

-- 
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.


Re: [HACKERS] Multiple synchronous_standby_names rules

2020-06-08 Thread James Sewell
On Thu, 12 Jan 2017 at 12:06, Michael Paquier 
wrote:

> On Thu, Jan 12, 2017 at 9:53 AM, James Sewell 
> wrote:
> > What is needed to support this is the ability to configure Px with
> something like:
> >
> >  1 (P1, P2, P3), 1 (D1, D2, D3)
> >
> > Would there be any appetite for this - or would it be seen as over
> complication of the current rules?
>
> There have been discussions about being able to do that and there are
> really use cases where that would be useful. As lately quorum commit
> has been committed, we have a better idea of the grammar to use
> (yeah!), though there are a couple of things remaining regarding the
> design of node subsets:
> - How to define group names? Making them mandatory would likely be the
> way to go.
> - How to represent that intuitively in pg_stat_replication? Perhaps
> the answer here is an extra column in this system view.
>

I'm coming back round to this as it's becoming increasingly discussed as we
look into systems with an RPO of (close to) 0 spanning multiple locations.

Before I start hacking - does anyone have any preference for syntax?

To me the best approach seems to be a list of items like:

ANY 2 (a,b) AS SiteA, ANY 2(c,d) AS SiteB
FIRST 2 (a,b,c) AS SiteA, d AS SiteB, e AS SiteC

You need a group name for the replication view as Michael noted, I suppose
you'd need to also allow no group name (one would be autogenerated??) for
backwards compatibility OR enforce group names when there are more than one
elements in the list.

I don't want to do embedded groups like:

ANY 2 (ANY 2 (a,b,c) AS SiteA, d as SiteB, c as SiteB)

Because that just seems like it's overcomplicating the issue.

thoughts?

James

-- 
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.


Re: Minimal logical decoding on standbys

2020-02-04 Thread James Sewell
Hi all,

This is great stuff! My understanding is that this patch guarantees 0 data
loss for a logical replication stream if the primary crashes and a standby
which was marked as sync at failure time is promoted.

Is this correct?

James
-- 
James Sewell,
Chief Architect

Suite 112, Jones Bay Wharf, 26-32 Pirrama Road, Pyrmont NSW 2009
P (+61) 2 8099 9000  W www.jirotech.com  F (+61) 2 8099 9099

-- 
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.


Re: Reaping Temp tables to avoid XID wraparound

2019-02-18 Thread James Sewell
On Mon, 18 Feb 2019 at 12:31, Michael Paquier  wrote:

> On Sun, Feb 17, 2019 at 05:47:09PM +0100, Magnus Hagander wrote:
> > We could I guess add a field specifically for temp_namespace_xid or such.
> > The question is if it's worth the overhead to do that.
>
> That would mean an extra 4 bytes in PGPROC, which is something we
> could live with, still the use-case looks rather narrow to me to
> justify that.
>

I agree the use case is narrow - but it's also pretty critical.

This is a very real way that transaction wraparound can be hit, with no
automated or manual way of solving it (apart from randomly terminating
backends (you have to search via user and hope there is only one, and that
it matches the temp table owner) or restarting Postgres).

I suppose an in-core way of disconnecting idle sessions after x time would
work too - but that seems like a sledgehammer approach.

-- 
James

-- 
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.


Re: Reaping Temp tables to avoid XID wraparound

2019-02-17 Thread James Sewell
> Yeah, possibly.  I think that it could be tricky though to get that at
>> a global level in a cheap way.  It makes also little sense to only
>> show the temp namespace OID if that information is not enough.
>>
>
> We could I guess add a field specifically for temp_namespace_xid or such.
> The question is if it's worth the overhead to do that.
>
> Just having the namespace oid is at least enough to know that there is
> potentially something to go look at it. But it doesn't make for automated
> monitoring very well, at least not in systems that have a larger number of
> databases.
>

You can get the namespace oid today with a JOIN, the issue is that this
isn't enough information to go an look at - at the end of the day it's
useless unless you can remove the temp table or terminate the session which
owns it.

-- 
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.


Re: Reaping Temp tables to avoid XID wraparound

2019-02-13 Thread James Sewell
It's easy to identify the temp tables which are causing the problem, yes.
The issue here is just getting rid of them.

In an ideal world I wouldn't actually have to care about the session and I
could just drop the table (or vacuum the table?).

Dropping the session was just the best way I could find to currently solve
the problem.

Cheers,

James Sewell,



Suite 112, Jones Bay Wharf, 26-32 Pirrama Road, Pyrmont NSW 2009
*P *(+61) 2 8099 9000 <(+61)%202%208099%209000>  *W* www.jirotech.com  *F *
(+61) 2 8099 9099 <(+61)%202%208099%209000>


On Thu, 14 Feb 2019 at 04:09, Magnus Hagander  wrote:

> On Wed, Feb 13, 2019 at 6:05 PM Tom Lane  wrote:
>
>> Magnus Hagander  writes:
>> > And while at it, what would in this particular case have been even more
>> > useful to the OP would be to actually identify that there is a temp
>> table
>> > *and which xid it's blocking at*. For regular transactions we can look
>> at
>> > backend_xid, but IIRC that doesn't work for temp tables (unless they are
>> > inside a transaction). Maybe we can find a way to expose that type of
>> > relevant information at a similar level while poking around that code?
>>
>> Maybe I'm confused, but doesn't the table's pg_class row tell you what
>> you need to know?  You can't look inside another session's temp table,
>> but you don't need to.
>>
>
> I believe it does, yes.
>
> But that doesn't make for a way to conveniently go "what is it that's
> causing waparound problems", since due to pg_class being per database, you
> have to loop over all your databases to find that query. Having that
> information available in a way that's easy for monitoring to get at (much
> as the backend_xid field in pg_stat_activity can help you wrt general
> snapshots) would be useful.
>
> --
>  Magnus Hagander
>  Me: https://www.hagander.net/ <http://www.hagander.net/>
>  Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
>

-- 
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.


Reaping Temp tables to avoid XID wraparound

2019-02-12 Thread James Sewell
Hi all,

I hit an issue yesterday where I was quickly nearing XID wraparound on a
database due to a temp table being created in a session which was then left
IDLE out of transaction for 6 days.

I solved the issue by tracing the owner of the temp table back to a session
and terminating it - in my case I was just lucky that there was one session
for that user.

I'm looking for a way to identify the PID of the backend which owns a temp
table identified as being an issue so I can terminate it.

>From the temp table namespace I can get the backend ID using a regex - but
I have no idea how I can map that to a PID - any thoughts?

Cheers,

James Sewell,



Suite 112, Jones Bay Wharf, 26-32 Pirrama Road, Pyrmont NSW 2009
*P *(+61) 2 8099 9000 <(+61)%202%208099%209000>  *W* www.jirotech.com  *F *
(+61) 2 8099 9099 <(+61)%202%208099%209000>

-- 
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.


Re: Undo logs

2018-05-27 Thread James Sewell
Exciting stuff! Really looking forward to having a play with this.

James Sewell,
*Chief Architect*



Suite 112, Jones Bay Wharf, 26-32 Pirrama Road, Pyrmont NSW 2009
*P *(+61) 2 8099 9000 <(+61)%202%208099%209000>  *W* www.jirotech.com  *F *
(+61) 2 8099 9099 <(+61)%202%208099%209000>

On 25 May 2018 at 08:22, Thomas Munro <thomas.mu...@enterprisedb.com> wrote:

> Hello hackers,
>
> As announced elsewhere[1][2][3], at EnterpriseDB we are working on a
> proposal to add in-place updates with undo logs to PostgreSQL.  The
> goal is to improve performance and resource usage by recycling space
> better.
>
> The lowest level piece of this work is a physical undo log manager,
> which I've personally been working on.  Later patches will build on
> top, adding record-oriented access and then the main "zheap" access
> manager and related infrastructure.  My colleagues will write about
> those.
>
> The README files[4][5] explain in more detail, but here is a
> bullet-point description of what the attached patch set gives you:
>
> 1.  Efficient appending of new undo data from many concurrent
> backends.  Like logs.
> 2.  Efficient discarding of old undo data that isn't needed anymore.
> Like queues.
> 3.  Efficient buffered random reading of undo data.  Like relations.
>
> A test module is provided that can be used to exercise the undo log
> code paths without needing any of the later zheap patches.
>
> This is work in progress.  A few aspects are under active development
> and liable to change, as indicated by comments, and there are no doubt
> bugs and room for improvement.  The code is also available at
> github.com/EnterpriseDB/zheap (these patches are from the
> undo-log-storage branch, see also the master branch which has the full
> zheap feature).  We'd be grateful for any questions, feedback or
> ideas.
>
> [1] https://amitkapila16.blogspot.com/2018/03/zheap-storage-
> engine-to-provide-better.html
> [2] https://rhaas.blogspot.com/2018/01/do-or-undo-there-is-no-vacuum.html
> [3] https://www.pgcon.org/2018/schedule/events/1190.en.html
> [4] https://github.com/EnterpriseDB/zheap/tree/undo-
> log-storage/src/backend/access/undo
> [5] https://github.com/EnterpriseDB/zheap/tree/undo-
> log-storage/src/backend/storage/smgr
>
> --
> Thomas Munro
> http://www.enterprisedb.com
>

-- 
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.