Re: Failed to delete old ReorderBuffer spilled files

2018-01-09 Thread atorikoshi


On 2018/01/06 0:30, Alvaro Herrera wrote:

Ahh, so the reason I didn't see these crashes is that Atsushi had
already fixed them.  Nice.  It would be *very* good to trim the quoted
material when you reply --- don't leave everything, just enough lines
for context.  I would have seen this comment if it didn't require me to
scroll pages and pages and pages of useless material.


Sorry for the confusion.



I have pushed this patch from 9.4 to master.  I reformulated the
comments.


Thanks for modifications and committing.

Regards,

--
Atsushi Torikoshi
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center




Re: Failed to delete old ReorderBuffer spilled files

2018-01-05 Thread Alvaro Herrera
atorikoshi wrote:

> > FYI "make check" in contrib/test_decoding fails a couple of isolation
> > tests, one with an assertion failure for my automatic patch tester[1].
> > Same result on my laptop:
> > 
> > test ondisk_startup   ... FAILED (test process exited with exit 
> > code 1)
> > test concurrent_ddl_dml   ... FAILED (test process exited with exit 
> > code 1)
> > 
> > TRAP: FailedAssertion("!(!dlist_is_empty(head))", File:
> > "../../../../src/include/lib/ilist.h", Line: 458)
> 
> Thanks for letting me know.
> I think I only tested running "make check" at top directory, sorry
> for my insufficient test.
> 
> The test failure happened at the beginning of replication(creating
> slot), so there are no changes yet and getting the tail of changes
> failed.
> 
> Because the bug we are fixing only happens after creating files,
> I've added "txn->serialized" to the if statement condition.

Ahh, so the reason I didn't see these crashes is that Atsushi had
already fixed them.  Nice.  It would be *very* good to trim the quoted
material when you reply --- don't leave everything, just enough lines
for context.  I would have seen this comment if it didn't require me to
scroll pages and pages and pages of useless material.

I have pushed this patch from 9.4 to master.  I reformulated the
comments.

This is an "implicitly aborted transaction":

alvherre=# create table implicit (a int);
CREATE TABLE
Duración: 0,959 ms
alvherre=# insert into implicit select 1/i from generate_series(5, -5, -1) i;
ERROR:  division by zero

Note there is no explicit ABORT here, so it is implicit.

But that is not what you were trying to say.  The term "crashed
transaction" seems to me to convey the meaning better, so I used that.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Failed to delete old ReorderBuffer spilled files

2018-01-05 Thread Alvaro Herrera
Thomas Munro wrote:
> On Wed, Nov 22, 2017 at 12:27 AM, atorikoshi
>  wrote:
> > [set_final_lsn_2.patch]
> 
> Hi Torikoshi-san,
> 
> FYI "make check" in contrib/test_decoding fails a couple of isolation
> tests, one with an assertion failure for my automatic patch tester[1].
> Same result on my laptop:
> 
> test ondisk_startup   ... FAILED (test process exited with exit code 
> 1)
> test concurrent_ddl_dml   ... FAILED (test process exited with exit code 
> 1)
> 
> TRAP: FailedAssertion("!(!dlist_is_empty(head))", File:
> "../../../../src/include/lib/ilist.h", Line: 458)

I observed a couple of crashes too a couple of times, while testing this
patch.  But I have seen several completely different crashes.  This
crash you show I have not been able to reproduce, though I've run this
in 94 and master many times.

For example, I got a backtrace that looks like this in 9.6:

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x7f19ccb913fa in __GI_abort () at abort.c:89
#2  0x55e7511f451b in errfinish (dummy=)
at /pgsql/source/REL9_6_STABLE/src/backend/utils/error/elog.c:557
#3  0x55e750ed732b in XLogFileInit (logsegno=1, 
use_existent=use_existent@entry=0x7ffdbc34ab6f "\001\002", 
use_lock=use_lock@entry=1 '\001')
at /pgsql/source/REL9_6_STABLE/src/backend/access/transam/xlog.c:3023
#4  0x55e750edb227 in XLogWrite (WriteRqst=..., flexible=flexible@entry=0 
'\000')
at /pgsql/source/REL9_6_STABLE/src/backend/access/transam/xlog.c:2258
#5  0x55e750ee162d in XLogBackgroundFlush ()
at /pgsql/source/REL9_6_STABLE/src/backend/access/transam/xlog.c:2894

then in 9.4 I saw this one:

creating information schema ... ok
loading PL/pgSQL server-side language ... ok
vacuuming database template1 ... ok
copying template1 to template0 ... FATAL:  could not open directory 
"pg_logical/snapshots": No such file or directory
STATEMENT:  CREATE DATABASE template0;

WARNING:  could not remove file or directory "base/12148": No such file or 
directory
WARNING:  some useless files may be left behind in old database directory 
"base/12148"
FATAL:  could not access status of transaction 0
DETAIL:  Could not open file "pg_clog/": No such file or directory.
child process exited with exit code 1

What this indicates to me is that perhaps the test harness is doing
stupid things such as running two servers concurrently in the same
datadir, so they overwrite one another.  If I take out the "-j2" from
make, this no longer reproduces.

Therefore, I'm going to push this patch shortly because clearly this
problem is not its fault.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Failed to delete old ReorderBuffer spilled files

2017-11-23 Thread Craig Ringer
On 24 November 2017 at 12:38, atorikoshi  wrote:

>
>
> On 2017/11/24 10:57, Craig Ringer wrote:
> > On 24 November 2017 at 09:20, atorikoshi  co.jp
> >> wrote:
> >
> >>
> >> Thanks for letting me know.
> >> I think I only tested running "make check" at top directory, sorry
> >> for my insufficient test.
> >>
> >> The test failure happened at the beginning of replication(creating
> >> slot), so there are no changes yet and getting the tail of changes
> >> failed.
> >>
> >> Because the bug we are fixing only happens after creating files,
> >> I've added "txn->serialized" to the if statement condition.
> >
> >
> > Thanks.
> >
> > Re-reading the patch I see
> >
> >  * The final_lsn of which transaction that hasn't produced an abort
> >  * record is invalid.
> >
> > which I find very hard to parse. I suggest:
> >
> >  We set final_lsn when we decode the commit or abort record for a
> > transaction,
> >  but transactions implicitly aborted by a crash never write such a
> record.
> >
> > then continue from there with the same text as in the patch.
> >
> > Otherwise I'm happy. It passes make check, test decoding and the recovery
> > TAP tests too.
> >
>
> Thanks for your kind suggestion and running test.
> I've added your comment at the beginning.
>
>
Marked ready for committer.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Failed to delete old ReorderBuffer spilled files

2017-11-23 Thread atorikoshi



On 2017/11/24 10:57, Craig Ringer wrote:
> On 24 November 2017 at 09:20, atorikoshi 
> wrote:
>
>>
>> Thanks for letting me know.
>> I think I only tested running "make check" at top directory, sorry
>> for my insufficient test.
>>
>> The test failure happened at the beginning of replication(creating
>> slot), so there are no changes yet and getting the tail of changes
>> failed.
>>
>> Because the bug we are fixing only happens after creating files,
>> I've added "txn->serialized" to the if statement condition.
>
>
> Thanks.
>
> Re-reading the patch I see
>
>  * The final_lsn of which transaction that hasn't produced an abort
>  * record is invalid.
>
> which I find very hard to parse. I suggest:
>
>  We set final_lsn when we decode the commit or abort record for a
> transaction,
>  but transactions implicitly aborted by a crash never write such a 
record.

>
> then continue from there with the same text as in the patch.
>
> Otherwise I'm happy. It passes make check, test decoding and the recovery
> TAP tests too.
>

Thanks for your kind suggestion and running test.
I've added your comment at the beginning.

--
Atsushi Torikoshi
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

diff --git a/src/backend/replication/logical/reorderbuffer.c 
b/src/backend/replication/logical/reorderbuffer.c
index 0f607ba..ee28d26 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -1724,6 +1724,22 @@ ReorderBufferAbortOld(ReorderBuffer *rb, TransactionId 
oldestRunningXid)
 
if (TransactionIdPrecedes(txn->xid, oldestRunningXid))
{
+   /*
+* We set final_lsn when we decode the commit or abort 
record for
+* a transaction, but transactions implicitly aborted 
by a crash
+* never write such a record.
+* The final_lsn of which transaction that hasn't 
produced an abort
+* record is invalid. To ensure cleanup the serialized 
changes of
+* such transaction we set the LSN of the last change 
action to
+* final_lsn.
+*/
+   if (txn->serialized && txn->final_lsn == 0)
+   {
+   ReorderBufferChange *last_change =
+   dlist_tail_element(ReorderBufferChange, node, 
>changes);
+   txn->final_lsn = last_change->lsn;
+   }
+
elog(DEBUG2, "aborting old transaction %u", txn->xid);
 
/* remove potential on-disk data, and deallocate this 
tx */
diff --git a/src/include/replication/reorderbuffer.h 
b/src/include/replication/reorderbuffer.h
index 86effe1..7931757 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -168,6 +168,8 @@ typedef struct ReorderBufferTXN
 * * plain abort record
 * * prepared transaction abort
 * * error during decoding
+* Note that this can also be a LSN of the last change action of this 
xact
+* if it is an implicitly aborted transaction.
 * 
 */
XLogRecPtr  final_lsn;


Re: Failed to delete old ReorderBuffer spilled files

2017-11-23 Thread Craig Ringer
On 24 November 2017 at 09:20, atorikoshi  wrote:

>
> Thanks for letting me know.
> I think I only tested running "make check" at top directory, sorry
> for my insufficient test.
>
> The test failure happened at the beginning of replication(creating
> slot), so there are no changes yet and getting the tail of changes
> failed.
>
> Because the bug we are fixing only happens after creating files,
> I've added "txn->serialized" to the if statement condition.


Thanks.

Re-reading the patch I see

 * The final_lsn of which transaction that hasn't produced an abort
 * record is invalid.

which I find very hard to parse. I suggest:

 We set final_lsn when we decode the commit or abort record for a
transaction,
 but transactions implicitly aborted by a crash never write such a record.

then continue from there with the same text as in the patch.

Otherwise I'm happy. It passes make check, test decoding and the recovery
TAP tests too.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Failed to delete old ReorderBuffer spilled files

2017-11-23 Thread atorikoshi

On 2017/11/22 16:49, Masahiko Sawada wrote:

On Wed, Nov 22, 2017 at 2:56 PM, Craig Ringer  wrote:

On 22 November 2017 at 12:15, Kyotaro HORIGUCHI
 wrote:


At Wed, 22 Nov 2017 12:57:34 +0900, Michael Paquier
 wrote in

Re: Failed to delete old ReorderBuffer spilled files

2017-11-21 Thread Masahiko Sawada
On Wed, Nov 22, 2017 at 2:56 PM, Craig Ringer  wrote:
> On 22 November 2017 at 12:15, Kyotaro HORIGUCHI
>  wrote:
>>
>> At Wed, 22 Nov 2017 12:57:34 +0900, Michael Paquier
>>  wrote in
>> 

Re: Failed to delete old ReorderBuffer spilled files

2017-11-21 Thread Craig Ringer
On 22 November 2017 at 12:15, Kyotaro HORIGUCHI <
horiguchi.kyot...@lab.ntt.co.jp> wrote:

> At Wed, 22 Nov 2017 12:57:34 +0900, Michael Paquier <
> michael.paqu...@gmail.com> wrote in 

Re: Failed to delete old ReorderBuffer spilled files

2017-11-21 Thread Kyotaro HORIGUCHI
At Wed, 22 Nov 2017 12:57:34 +0900, Michael Paquier  
wrote in 

Re: Failed to delete old ReorderBuffer spilled files

2017-11-21 Thread Kyotaro HORIGUCHI
At Tue, 21 Nov 2017 20:53:04 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
 wrote in 
<20171121.205304.90315453.horiguchi.kyot...@lab.ntt.co.jp>
> At Tue, 21 Nov 2017 20:27:25 +0900, atorikoshi 
>  wrote in 
> 
> > Thanks for reviewing!
> > 
> > 
> > On 2017/11/21 18:12, Masahiko Sawada wrote:
> > > On Tue, Nov 21, 2017 at 3:48 PM, Masahiko Sawada
> > >  wrote:
> > >> On Mon, Nov 20, 2017 at 7:35 PM, atorikoshi
> > >>  wrote:
> > >>> Hi,
> > >>>
> > >>> I put many queries into one transaction and made ReorderBuffer spill
> > >>> data to disk, and sent SIGKILL to postgres before the end of the
> > >>> transaction.
> > >>>
> > >>> After starting up postgres again, I observed the files spilled to
> > >>> data wasn't deleted.
> > >>>
> > >>> I think these files should be deleted because its transaction was no
> > >>> more valid, so no one can use these files.
> > >>>
> > >>>
> > >>> Below is a reproduction instructions.
> > >>>
> > >>> 
> > >>> 1. Create table and publication at publiser.
> > >>>
> > >>>   @pub =# CREATE TABLE t1(
> > >>>   id INT PRIMARY KEY,
> > >>>   name TEXT);
> > >>>
> > >>>   @pub =# CREATE PUBLICATION pub FOR TABLE t1;
> > >>>
> > >>> 2. Create table and subscription at subscriber.
> > >>>
> > >>>   @sub =# CREATE TABLE t1(
> > >>>   id INT PRIMARY KEY,
> > >>>   name TEXT
> > >>>   );
> > >>>
> > >>>   @sub =# CREATE SUBSCRIPTION sub
> > >>>   CONNECTION 'host=[hostname] port=[port] dbname=[dbname]'
> > >>>   PUBLICATION pub;
> > >>>
> > >>> 3. Put many queries into one transaction.
> > >>>
> > >>>   @pub =# BEGIN;
> > >>> INSERT INTO t1
> > >>> SELECT
> > >>>   i,
> > >>>   'aa'
> > >>> FROM
> > >>> generate_series(1, 100) as i;
> > >>>
> > >>> 4. Then we can see spilled files.
> > >>>
> > >>>   @pub $ ls -1 ${PGDATA}/pg_replslot/sub/
> > >>> state
> > >>> xid-561-lsn-0-100.snap
> > >>> xid-561-lsn-0-200.snap
> > >>> xid-561-lsn-0-300.snap
> > >>> xid-561-lsn-0-400.snap
> > >>> xid-561-lsn-0-500.snap
> > >>> xid-561-lsn-0-600.snap
> > >>> xid-561-lsn-0-700.snap
> > >>> xid-561-lsn-0-800.snap
> > >>> xid-561-lsn-0-900.snap
> > >>>
> > >>> 5. Kill publisher's postgres process before COMMIT.
> > >>>
> > >>>   @pub $ kill -s SIGKILL [pid of postgres]
> > >>>
> > >>> 6. Start publisher's postgres process.
> > >>>
> > >>>   @pub $ pg_ctl start -D ${PGDATA}
> > >>>
> > >>> 7. After a while, we can see the files remaining.
> > >>>   (Immediately after starting publiser, we can not see these files.)
> > >>>
> > >>>   @pub $ pg_ctl start -D ${PGDATA}
> > >>>
> > >>>   When I configured with '--enable-cassert', below assertion error
> > >>>   was appeared.
> > >>>
> > >>> TRAP: FailedAssertion("!(txn->final_lsn != 0)", File:
> > >>> "reorderbuffer.c",
> > >>> Line: 2576)
> > >>> 
> > >>>
> > >>> Attached patch sets final_lsn to the last ReorderBufferChange if
> > >>> final_lsn == 0.
> > >>
> > >> Thank you for the report. I could reproduce this issue with the above
> > >> step. My analysis is, the cause of that a serialized reorder buffer
> > >> isn't cleaned up is that the aborted transaction without an abort WAL
> > >> record has no chance to set ReorderBufferTXN->final_lsn. So if there
> > >> is such serialized transactions ReorderBufferRestoreCleanup cleanups
> > >> no files, which is cause of the assertion failure (or a file being
> > >> orphaned). What do you think?
> > >>
> > >> On detail of your patch, I'm not sure it's safe if we set the lsn of
> > >> other than commit record or abort record to final_lsn. The comment in
> > >> reorderbuffer.h says,
> > >>
> > >> typedef trcut ReorderBufferTXN
> > >> {
> > >> (snip)
> > >>
> > >> /* 
> > >>  * LSN of the record that lead to this xact to be committed or
> > >>  * aborted. This can be a
> > >>  * * plain commit record
> > >>  * * plain commit record, of a parent transaction
> > >>  * * prepared transaction commit
> > >>  * * plain abort record
> > >>  * * prepared transaction abort
> > >>  * * error during decoding
> > >>  * 
> > >>  */
> > >> XLogRecPtr  final_lsn;
> > >>
> > >> But with your patch, we could set a lsn of a record that is other than
> > >> what listed above to final_lsn. One way I came up with is to make
> > 
> > I added some comments on final_lsn.
> > 
> > >> ReorderBufferRestoreCleanup accept an invalid value of final_lsn and
> > >> regards it as a aborted transaction that doesn't has a abort WAL
> > >> record. So we can cleanup all serialized files if final_lsn of a
> > >> transaction is invalid.
> > >
> > > After more 

Re: Failed to delete old ReorderBuffer spilled files

2017-11-21 Thread Kyotaro HORIGUCHI
Hello,

At Tue, 21 Nov 2017 20:27:25 +0900, atorikoshi 
 wrote in 

> Thanks for reviewing!
> 
> 
> On 2017/11/21 18:12, Masahiko Sawada wrote:
> > On Tue, Nov 21, 2017 at 3:48 PM, Masahiko Sawada
> >  wrote:
> >> On Mon, Nov 20, 2017 at 7:35 PM, atorikoshi
> >>  wrote:
> >>> Hi,
> >>>
> >>> I put many queries into one transaction and made ReorderBuffer spill
> >>> data to disk, and sent SIGKILL to postgres before the end of the
> >>> transaction.
> >>>
> >>> After starting up postgres again, I observed the files spilled to
> >>> data wasn't deleted.
> >>>
> >>> I think these files should be deleted because its transaction was no
> >>> more valid, so no one can use these files.
> >>>
> >>>
> >>> Below is a reproduction instructions.
> >>>
> >>> 
> >>> 1. Create table and publication at publiser.
> >>>
> >>>   @pub =# CREATE TABLE t1(
> >>>   id INT PRIMARY KEY,
> >>>   name TEXT);
> >>>
> >>>   @pub =# CREATE PUBLICATION pub FOR TABLE t1;
> >>>
> >>> 2. Create table and subscription at subscriber.
> >>>
> >>>   @sub =# CREATE TABLE t1(
> >>>   id INT PRIMARY KEY,
> >>>   name TEXT
> >>>   );
> >>>
> >>>   @sub =# CREATE SUBSCRIPTION sub
> >>>   CONNECTION 'host=[hostname] port=[port] dbname=[dbname]'
> >>>   PUBLICATION pub;
> >>>
> >>> 3. Put many queries into one transaction.
> >>>
> >>>   @pub =# BEGIN;
> >>> INSERT INTO t1
> >>> SELECT
> >>>   i,
> >>>   'aa'
> >>> FROM
> >>> generate_series(1, 100) as i;
> >>>
> >>> 4. Then we can see spilled files.
> >>>
> >>>   @pub $ ls -1 ${PGDATA}/pg_replslot/sub/
> >>> state
> >>> xid-561-lsn-0-100.snap
> >>> xid-561-lsn-0-200.snap
> >>> xid-561-lsn-0-300.snap
> >>> xid-561-lsn-0-400.snap
> >>> xid-561-lsn-0-500.snap
> >>> xid-561-lsn-0-600.snap
> >>> xid-561-lsn-0-700.snap
> >>> xid-561-lsn-0-800.snap
> >>> xid-561-lsn-0-900.snap
> >>>
> >>> 5. Kill publisher's postgres process before COMMIT.
> >>>
> >>>   @pub $ kill -s SIGKILL [pid of postgres]
> >>>
> >>> 6. Start publisher's postgres process.
> >>>
> >>>   @pub $ pg_ctl start -D ${PGDATA}
> >>>
> >>> 7. After a while, we can see the files remaining.
> >>>   (Immediately after starting publiser, we can not see these files.)
> >>>
> >>>   @pub $ pg_ctl start -D ${PGDATA}
> >>>
> >>>   When I configured with '--enable-cassert', below assertion error
> >>>   was appeared.
> >>>
> >>> TRAP: FailedAssertion("!(txn->final_lsn != 0)", File:
> >>> "reorderbuffer.c",
> >>> Line: 2576)
> >>> 
> >>>
> >>> Attached patch sets final_lsn to the last ReorderBufferChange if
> >>> final_lsn == 0.
> >>
> >> Thank you for the report. I could reproduce this issue with the above
> >> step. My analysis is, the cause of that a serialized reorder buffer
> >> isn't cleaned up is that the aborted transaction without an abort WAL
> >> record has no chance to set ReorderBufferTXN->final_lsn. So if there
> >> is such serialized transactions ReorderBufferRestoreCleanup cleanups
> >> no files, which is cause of the assertion failure (or a file being
> >> orphaned). What do you think?
> >>
> >> On detail of your patch, I'm not sure it's safe if we set the lsn of
> >> other than commit record or abort record to final_lsn. The comment in
> >> reorderbuffer.h says,
> >>
> >> typedef trcut ReorderBufferTXN
> >> {
> >> (snip)
> >>
> >> /* 
> >>  * LSN of the record that lead to this xact to be committed or
> >>  * aborted. This can be a
> >>  * * plain commit record
> >>  * * plain commit record, of a parent transaction
> >>  * * prepared transaction commit
> >>  * * plain abort record
> >>  * * prepared transaction abort
> >>  * * error during decoding
> >>  * 
> >>  */
> >> XLogRecPtr  final_lsn;
> >>
> >> But with your patch, we could set a lsn of a record that is other than
> >> what listed above to final_lsn. One way I came up with is to make
> 
> I added some comments on final_lsn.
> 
> >> ReorderBufferRestoreCleanup accept an invalid value of final_lsn and
> >> regards it as a aborted transaction that doesn't has a abort WAL
> >> record. So we can cleanup all serialized files if final_lsn of a
> >> transaction is invalid.
> >
> > After more thought, since there are some codes cosmetically setting
> > final_lsn when the fate of transaction is determined possibly we
> > should not accept a invalid value of final_lsn even in the case.
> >

It doesn't seem just a cosmetic. The first/last_lsn are used to
determin the files to be deleted. On the other hand the TXN
cannot have last_lsn since it hasn't see abort/commit record.

> My new patch keeps setting final_lsn, but changed its 

Failed to delete old ReorderBuffer spilled files

2017-11-20 Thread atorikoshi

Hi,

I put many queries into one transaction and made ReorderBuffer spill
data to disk, and sent SIGKILL to postgres before the end of the
transaction.

After starting up postgres again, I observed the files spilled to
data wasn't deleted.

I think these files should be deleted because its transaction was no
more valid, so no one can use these files.


Below is a reproduction instructions.


1. Create table and publication at publiser.

  @pub =# CREATE TABLE t1(
  id INT PRIMARY KEY,
  name TEXT);

  @pub =# CREATE PUBLICATION pub FOR TABLE t1;

2. Create table and subscription at subscriber.

  @sub =# CREATE TABLE t1(
  id INT PRIMARY KEY,
  name TEXT
  );

  @sub =# CREATE SUBSCRIPTION sub
  CONNECTION 'host=[hostname] port=[port] dbname=[dbname]'
  PUBLICATION pub;

3. Put many queries into one transaction.

  @pub =# BEGIN;
INSERT INTO t1
SELECT
  i,
  'aa'
FROM
generate_series(1, 100) as i;

4. Then we can see spilled files.

  @pub $ ls -1 ${PGDATA}/pg_replslot/sub/
state
xid-561-lsn-0-100.snap
xid-561-lsn-0-200.snap
xid-561-lsn-0-300.snap
xid-561-lsn-0-400.snap
xid-561-lsn-0-500.snap
xid-561-lsn-0-600.snap
xid-561-lsn-0-700.snap
xid-561-lsn-0-800.snap
xid-561-lsn-0-900.snap

5. Kill publisher's postgres process before COMMIT.

  @pub $ kill -s SIGKILL [pid of postgres]

6. Start publisher's postgres process.

  @pub $ pg_ctl start -D ${PGDATA}

7. After a while, we can see the files remaining.
  (Immediately after starting publiser, we can not see these files.)

  @pub $ pg_ctl start -D ${PGDATA}

  When I configured with '--enable-cassert', below assertion error
  was appeared.

TRAP: FailedAssertion("!(txn->final_lsn != 0)", File:  
"reorderbuffer.c", Line: 2576)



Attached patch sets final_lsn to the last ReorderBufferChange if
final_lsn == 0.

--
Atsushi Torikoshi
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

diff --git a/src/backend/replication/logical/reorderbuffer.c 
b/src/backend/replication/logical/reorderbuffer.c
index 0f607ba..51d5b71 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -1093,6 +1093,20 @@ ReorderBufferCleanupTXN(ReorderBuffer *rb, 
ReorderBufferTXN *txn)
ReorderBufferCleanupTXN(rb, subtxn);
}
 
+   /*
+* If this transaction encountered crash during transaction,
+* txn->final_lsn remains initial value.
+* To properly remove entries which were spilled to disk, we need valid
+* final_lsn.
+* So we set final_lsn to the lsn of the last ReorderBufferChange.
+*/
+   if (txn->final_lsn == 0)
+   {
+   ReorderBufferChange *last_change =
+   dlist_tail_element(ReorderBufferChange, node, >changes);
+   txn->final_lsn = last_change->lsn;
+   }
+
/* cleanup changes in the toplevel txn */
dlist_foreach_modify(iter, >changes)
{