Re: [HACKERS] [RFC] Transaction management overhaul is necessary?

2016-10-25 Thread Tsunakawa, Takayuki
From: Craig Ringer [mailto:cr...@2ndquadrant.com]
> >> This was because psqlODBC starts and ends a subtransaction for each
> >> SQL statement by default to implement statement-level rollback.  And
> >> PostgreSQL creates one CurTransactionContext memory context, which is
> >> 8KB, for each subtransaction and retain them until the top transaction
> ends.
> 
> Surely that's where to start then. Find a way to pool and re-use, fully
> release, or otherwise be done with transaction contexts for released
> savepoints.

Yes, I'll investigate this.  Any reference information would be appreciated on 
why the CurTransactionContexts had to be retained, and whether it's difficult 
to circumvent.


> You can control transaction level rollback in psqlODBC directly. You do
> not need to fall back to the old protocol. Check the driver options.

That driver option is Protocol=7.4-1.  The name is misleading, as the driver 
now ignores version part (7.4), and interprets 1 as transaction-rollback.


> Right. We can't just fire off each statement wrapped in SAVEPOINT and RELEASE
> SAVEPOINT because we need to get the result of the statement and decide
> whether to ROLLBACK TO SAVEPOINT or RELEASE SAVEPOINT. It only requires
> two round trips if you shove the SAVEPOINT in with the intended statement,
> but it's still messy.
> 
> I'd like to see an alternative statement with semantics more akin to COMMIT
> - which automatically into ROLLBACK if the tx is aborted.
> COMMIT SAVEPOINT would be too confusing since it's not truly committed.
> I don't know what to call it. But basically something that does RELEASE
> SAVEPOINT [named savepoint] unless the subxact is in aborted state, in which
> case it does ROLLBACK TO [named savepoint].
> Bonus points for letting it remember the last savepoint created and use
> that.
> 
> Furthermore, we should really add it on the protocol level so drivers can
> send subtransaction control messages more compactly, without needing to
> go through the parser etc, and without massively spamming the logs. For
> this purpose savepoint names would be internally generated so the driver
> wouldn't have to send them. We'd log savepoint boundaries when transaction
> logging was enabled. Since the client would send the first such protocol
> request we could do it on the sly without a protocol version bump; clients
> could just check server version and not use the new messages for older
> servers. If they send it to an older server they get a protocol error, which
> is fine.

I'm simply thinking of proposing a new GUC, something like "SET auto_rollback = 
{none | statement | transaction}", where none is the default and traditional 
behavior.


> > You should to implement a CALL statement - that can be independent on
> > outer transaction. The behave inside procedure called by CALL
> > statement should be same like client side - and there you can controll
> > transactions explicitly without nesting.
> 
> I agree that'd be desirable. Top level "procedures" are necessary for this,
> really.
> 
> This would also enable us to return multiple result sets.
> 
> We'd probably have to start at least one small read-only tx for the initial
> cache access to look up the proc and set everything up, but if we don't
> allocate xids local transactions are super cheap.

OK, that would be a very big challenge... I can't imagine how difficult it will 
be now.  But supporting the stored procedure with CALL statement would be a 
wall to overcome.


> However, I think trying to tackle the memory context bloat reported upthread
> would be a more effective starting point since it immediately targets the

Yes, I think I'll address this.  Maybe I'll start different threads for each 
topic:

1. Memory context bloat
2. Statement-level rollback
3. Stored procedures where transactions can be ended and started

Regards
Takayuki Tsunakawa


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Transaction management overhaul is necessary?

2016-10-21 Thread Craig Ringer
On 21 October 2016 at 18:57, Pavel Stehule  wrote:
> 2016-10-21 10:24 GMT+02:00 Tsunakawa, Takayuki
> :
>>
>> Hello,
>>
>> From our experience in handling customers' problems, I feel it's necessary
>> to evolve PostgreSQL's transaction management.  The concrete problems are:
>>
>> 1. PostgreSQL cannot end and begin transactions in PL/pgSQL and PL/Java
>> stored functions.
>> This is often the reason people could not migrate to PostgreSQL.

I've run into relatively few for whom this has landed up being a
showstopper, but I agree it's a pain.

There'll probably be more as bigger outfits seek to move from That
Other Database where it's routine to do this.

>> This was because psqlODBC starts and ends a subtransaction for each SQL
>> statement by default to implement statement-level rollback.  And PostgreSQL
>> creates one CurTransactionContext memory context, which is 8KB, for each
>> subtransaction and retain them until the top transaction ends.

Surely that's where to start then. Find a way to pool and re-use,
fully release, or otherwise be done with transaction contexts for
released savepoints.

>>  The total
>> memory used becomes 40GB (8KB * 5 million subtransactions.)  This was
>> avoided by setting the Protocol parameter to 7.4-1, which means
>> transaction-level rollback.

You can control transaction level rollback in psqlODBC directly. You
do not need to fall back to the old protocol. Check the driver
options.

>> The savepoint approach for supporting statement-level rollback is
>> inefficient, because it adds two roundtrips (SAVEPOINT and RELEASE) for each
>> statement.

Right. We can't just fire off each statement wrapped in SAVEPOINT and
RELEASE SAVEPOINT because we need to get the result of the statement
and decide whether to ROLLBACK TO SAVEPOINT or RELEASE SAVEPOINT. It
only requires two round trips if you shove the SAVEPOINT in with the
intended statement, but it's still messy.

I'd like to see an alternative statement with semantics more akin to
COMMIT - which automatically into ROLLBACK if the tx is aborted.
COMMIT SAVEPOINT would be too confusing since it's not truly
committed. I don't know what to call it. But basically something that
does RELEASE SAVEPOINT [named savepoint] unless the subxact is in
aborted state, in which case it does ROLLBACK TO [named savepoint].
Bonus points for letting it remember the last savepoint created and
use that.

Furthermore, we should really add it on the protocol level so drivers
can send subtransaction control messages more compactly, without
needing to go through the parser etc, and without massively spamming
the logs. For this purpose savepoint names would be internally
generated so the driver wouldn't have to send them. We'd log savepoint
boundaries when transaction logging was enabled. Since the client
would send the first such protocol request we could do it on the sly
without a protocol version bump; clients could just check server
version and not use the new messages for older servers. If they send
it to an older server they get a protocol error, which is fine.

> You should to implement a CALL statement - that can be independent on outer
> transaction. The behave inside procedure called by CALL statement should be
> same like client side - and there you can controll transactions explicitly
> without nesting.

I agree that'd be desirable. Top level "procedures" are necessary for
this, really.

This would also enable us to return multiple result sets.

We'd probably have to start at least one small read-only tx for the
initial cache access to look up the proc and set everything up, but if
we don't allocate xids local transactions are super cheap.

However, I think trying to tackle the memory context bloat reported
upthread would be a more effective starting point since it immediately
targets the problem actually experienced.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Transaction management overhaul is necessary?

2016-10-21 Thread Pavel Stehule
2016-10-21 10:24 GMT+02:00 Tsunakawa, Takayuki <
tsunakawa.ta...@jp.fujitsu.com>:

> Hello,
>
> From our experience in handling customers' problems, I feel it's necessary
> to evolve PostgreSQL's transaction management.  The concrete problems are:
>
> 1. PostgreSQL cannot end and begin transactions in PL/pgSQL and PL/Java
> stored functions.
> This is often the reason people could not migrate to PostgreSQL.


>
> 2. PostgreSQL does not support statement-level rollback.
> When some customer ran a batch app using psqlODBC, one postgres process
> used dozens of GBs of memory and crashed the OS.  The batch app prepares
> some SQL statements with parameters, execute it five millions of times with
> different parameter values in a single transaction.  They didn't experience
> a problem with Oracle.
>
> This was because psqlODBC starts and ends a subtransaction for each SQL
> statement by default to implement statement-level rollback.  And PostgreSQL
> creates one CurTransactionContext memory context, which is 8KB, for each
> subtransaction and retain them until the top transaction ends.  The total
> memory used becomes 40GB (8KB * 5 million subtransactions.)  This was
> avoided by setting the Protocol parameter to 7.4-1, which means
> transaction-level rollback.
>
> The savepoint approach for supporting statement-level rollback is
> inefficient, because it adds two roundtrips (SAVEPOINT and RELEASE) for
> each statement.
>
>
>
> I know autonomous transaction is also discussed, which seems to be
> difficult, so I hope some kind of transaction management overhaul can be
> discussed to cover all these transaction-related features.  How should I
> start?  I found the following item in the TODO list (but I haven't read it
> yet.)  What other discussions should I look at?
>

You should to implement a CALL statement - that can be independent on outer
transaction. The behave inside procedure called by CALL statement should be
same like client side - and there you can controll transactions explicitly
without nesting.

Regards

Pavel


>
> --
> Implement stored procedures
> This might involve the control of transaction state and the return of
> multiple result sets
> PL/pgSQL stored procedure returning multiple result sets (SELECTs)?
> Proposal: real procedures again (8.4)
> http://archives.postgresql.org/pgsql-hackers/2010-09/msg00542.php
> Gathering specs and discussion on feature (post 9.1)
> --
>
>
> Regards
> Takayuki Tsunakawa
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


[HACKERS] [RFC] Transaction management overhaul is necessary?

2016-10-21 Thread Tsunakawa, Takayuki
Hello,

>From our experience in handling customers' problems, I feel it's necessary to 
>evolve PostgreSQL's transaction management.  The concrete problems are:

1. PostgreSQL cannot end and begin transactions in PL/pgSQL and PL/Java stored 
functions.
This is often the reason people could not migrate to PostgreSQL.


2. PostgreSQL does not support statement-level rollback.
When some customer ran a batch app using psqlODBC, one postgres process used 
dozens of GBs of memory and crashed the OS.  The batch app prepares some SQL 
statements with parameters, execute it five millions of times with different 
parameter values in a single transaction.  They didn't experience a problem 
with Oracle.

This was because psqlODBC starts and ends a subtransaction for each SQL 
statement by default to implement statement-level rollback.  And PostgreSQL 
creates one CurTransactionContext memory context, which is 8KB, for each 
subtransaction and retain them until the top transaction ends.  The total 
memory used becomes 40GB (8KB * 5 million subtransactions.)  This was avoided 
by setting the Protocol parameter to 7.4-1, which means transaction-level 
rollback.

The savepoint approach for supporting statement-level rollback is inefficient, 
because it adds two roundtrips (SAVEPOINT and RELEASE) for each statement.



I know autonomous transaction is also discussed, which seems to be difficult, 
so I hope some kind of transaction management overhaul can be discussed to 
cover all these transaction-related features.  How should I start?  I found the 
following item in the TODO list (but I haven't read it yet.)  What other 
discussions should I look at?

--
Implement stored procedures 
This might involve the control of transaction state and the return of multiple 
result sets 
PL/pgSQL stored procedure returning multiple result sets (SELECTs)? 
Proposal: real procedures again (8.4) 
http://archives.postgresql.org/pgsql-hackers/2010-09/msg00542.php 
Gathering specs and discussion on feature (post 9.1) 
--


Regards
Takayuki Tsunakawa



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers