Re: Fix a typo in walsender.c

2018-03-28 Thread atorikoshi

On 2018/03/29 7:23, Bruce Momjian wrote:


Thanks, patch applied.


Thank you for committing!





Fix a typo in walsender.c

2018-02-27 Thread atorikoshi

Hi,

Attached a minor patch for a typo: s/log/lag

Regards,

--
Atsushi Torikoshi
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
diff --git a/src/backend/replication/walsender.c 
b/src/backend/replication/walsender.c
index d46374d..8bb1919 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -1245,7 +1245,7 @@ WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr 
lsn, TransactionId xid,
 /*
  * LogicalDecodingContext 'update_progress' callback.
  *
- * Write the current position to the log tracker (see XLogSendPhysical).
+ * Write the current position to the lag tracker (see XLogSendPhysical).
  */
 static void
 WalSndUpdateProgress(LogicalDecodingContext *ctx, XLogRecPtr lsn, 
TransactionId xid)


Re: Fix a typo in walsender.c

2018-02-08 Thread atorikoshi

On 2018/02/09 4:40, Robert Haas wrote:

On Thu, Feb 8, 2018 at 1:32 AM, atorikoshi
 wrote:

Attached a minor patch for variable name in comment:


Committed.



Thank you!




Fix a typo in walsender.c

2018-02-07 Thread atorikoshi

Hi,

Attached a minor patch for variable name in comment:  
s/progress_update/update_progress


  ---include/server/replication/logical.h
  ...
  35 typedef struct LogicalDecodingContext
  36 {
  ...
  68 LogicalOutputPluginWriterUpdateProgress update_progress;

Regards,

--
Atsushi Torikoshi
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
diff --git a/src/backend/replication/walsender.c 
b/src/backend/replication/walsender.c
index 130ecd5..d46374d 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -1243,7 +1243,7 @@ WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr 
lsn, TransactionId xid,
 }
 
 /*
- * LogicalDecodingContext 'progress_update' callback.
+ * LogicalDecodingContext 'update_progress' callback.
  *
  * Write the current position to the log tracker (see XLogSendPhysical).
  */


Re: User defined data types in Logical Replication

2018-01-30 Thread atorikoshi

Hi,

It seems to be in the middle of discussion, but I became a reviewer of
this problem several days ago so I've tested the latest patch
'fix_slot_store_error_callback_v12.patch'.

I reproduced the below ERROR by inserting elog() to INPUT function
of CREATE TYPE, and confirmed that above patch solves the problem.

>>  ERROR:  XX000: cache lookup failed for type X

I also ran make check-world and there was no error.

Is the only remaining topic memory leaks?


On 2018/01/09 17:22, Masahiko Sawada wrote:

>> I suppose it's not a common use pattern, but it'd be good
>> to avoid everlasting memleaks.
>
> I agree. Can we remove entry from hash table in the callbacks instead
> of setting InvalidOid when invalidate caches?


--
Atsushi Torikoshi
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center




Re: Failed to delete old ReorderBuffer spilled files

2018-01-09 Thread atorikoshi


On 2018/01/06 0:30, Alvaro Herrera wrote:

Ahh, so the reason I didn't see these crashes is that Atsushi had
already fixed them.  Nice.  It would be *very* good to trim the quoted
material when you reply --- don't leave everything, just enough lines
for context.  I would have seen this comment if it didn't require me to
scroll pages and pages and pages of useless material.


Sorry for the confusion.



I have pushed this patch from 9.4 to master.  I reformulated the
comments.


Thanks for modifications and committing.

Regards,

--
Atsushi Torikoshi
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center




Re: Failed to delete old ReorderBuffer spilled files

2017-11-23 Thread atorikoshi



On 2017/11/24 10:57, Craig Ringer wrote:
> On 24 November 2017 at 09:20, atorikoshi 

>> wrote:
>
>>
>> Thanks for letting me know.
>> I think I only tested running "make check" at top directory, sorry
>> for my insufficient test.
>>
>> The test failure happened at the beginning of replication(creating
>> slot), so there are no changes yet and getting the tail of changes
>> failed.
>>
>> Because the bug we are fixing only happens after creating files,
>> I've added "txn->serialized" to the if statement condition.
>
>
> Thanks.
>
> Re-reading the patch I see
>
>  * The final_lsn of which transaction that hasn't produced an abort
>  * record is invalid.
>
> which I find very hard to parse. I suggest:
>
>  We set final_lsn when we decode the commit or abort record for a
> transaction,
>  but transactions implicitly aborted by a crash never write such a 
record.

>
> then continue from there with the same text as in the patch.
>
> Otherwise I'm happy. It passes make check, test decoding and the recovery
> TAP tests too.
>

Thanks for your kind suggestion and running test.
I've added your comment at the beginning.

--
Atsushi Torikoshi
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

diff --git a/src/backend/replication/logical/reorderbuffer.c 
b/src/backend/replication/logical/reorderbuffer.c
index 0f607ba..ee28d26 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -1724,6 +1724,22 @@ ReorderBufferAbortOld(ReorderBuffer *rb, TransactionId 
oldestRunningXid)
 
if (TransactionIdPrecedes(txn->xid, oldestRunningXid))
{
+   /*
+* We set final_lsn when we decode the commit or abort 
record for
+* a transaction, but transactions implicitly aborted 
by a crash
+* never write such a record.
+* The final_lsn of which transaction that hasn't 
produced an abort
+* record is invalid. To ensure cleanup the serialized 
changes of
+* such transaction we set the LSN of the last change 
action to
+* final_lsn.
+*/
+   if (txn->serialized && txn->final_lsn == 0)
+   {
+   ReorderBufferChange *last_change =
+   dlist_tail_element(ReorderBufferChange, node, 
&txn->changes);
+   txn->final_lsn = last_change->lsn;
+   }
+
elog(DEBUG2, "aborting old transaction %u", txn->xid);
 
/* remove potential on-disk data, and deallocate this 
tx */
diff --git a/src/include/replication/reorderbuffer.h 
b/src/include/replication/reorderbuffer.h
index 86effe1..7931757 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -168,6 +168,8 @@ typedef struct ReorderBufferTXN
 * * plain abort record
 * * prepared transaction abort
 * * error during decoding
+* Note that this can also be a LSN of the last change action of this 
xact
+* if it is an implicitly aborted transaction.
 * 
 */
XLogRecPtr  final_lsn;


Re: Failed to delete old ReorderBuffer spilled files

2017-11-23 Thread atorikoshi

On 2017/11/22 16:49, Masahiko Sawada wrote:

On Wed, Nov 22, 2017 at 2:56 PM, Craig Ringer  wrote:

On 22 November 2017 at 12:15, Kyotaro HORIGUCHI
 wrote:


At Wed, 22 Nov 2017 12:57:34 +0900, Michael Paquier
 wrote in


On Wed, Nov 22, 2017 at 11:49 AM, Craig Ringer 
wrote:

On 20 November 2017 at 18:35, atorikoshi
 wrote:

I put many queries into one transaction and made ReorderBuffer spill
data to disk, and sent SIGKILL to postgres before the end of the
transaction.

After starting up postgres again, I observed the files spilled to
data wasn't deleted.


Since this can only happen  on crash exits, and the reorderbuffer data
is
useless after a decoding backend exits, why don't we just recursively
delete
the tree contents on Pg startup?


+1. You would just need an extra step after say
DeleteAllExportedSnapshotFiles() in startup.c. Looks saner to me do so
so as well.


The old files are being removed at startup by
StartupReorderBuffer.



That seems to contradict the statement above, that "after starting up
postgres again, I observed the files spilled to disk weren't deleted".



At the time of assertion failure, the files are not of the
previous run, but they are created after reconnection from the
subscriber.



Are you saying the problem files do not exist when we start up, but are
created and then leaked after logical decoding resumes after an unclean
startup?

... Yes, that does appear to be the case, per the original report:

"7. After a while, we can see the files remaining.
  (Immediately after starting publiser, we can not see these files.)"

I was confused by "remaining". They're not remaining, they've been
re-created.


Sorry for my incorrect description, I should have said re-created.



But if they're re-created, why are they not recreated correctly after an
unclean shutdown? What persistent state is causing that? We should be
clobbering saved reorder buffer temp files, snapshots, etc at startup. The
slot state is pretty simple, it'd just be a bit behind.

The key difference seems to be that we hard-kill the server so it can't
write anything to clog. The xact is implicitly aborted, we never wrote any
xlog record for a commit or abort. The problem is presumably with decoding
xacts that were implicitly aborted by server crash, after we restart the
server and resume decoding.

The assertion failure reported is in ReorderBufferRestoreCleanup, which
makes sense.

Because there's no xlog record of the crash, we never set the buffer's
final_lsn in ReorderBufferCommit or ReorderBufferAbort .


Yeah, that's the same as my analysis.


Note the comment there:

 * NB: Transactions handled here have to have actively aborted (i.e. have
 * produced an abort record). Implicitly aborted transactions are handled
via
 * ReorderBufferAbortOld(); transactions we're just not interested in, but
 * which have committed are handled in ReorderBufferForget().

That's only called from DecodeStandbyOp in response to an xl_running_xacts.

Here's the backtrace.

Core was generated by `postgres: wal sender process postgres [local'.
Program terminated with signal SIGABRT, Aborted.
...
#2  0x008537b7 in ExceptionalCondition
(conditionName=conditionName@entry=0x9fcdf6 "!(txn->final_lsn != 0)",
errorType=errorType@entry=0x89bcb4 "FailedAssertion",
fileName=fileName@entry=0x9fcd04 "reorderbuffer.c",
lineNumber=lineNumber@entry=2576) at assert.c:54
#3  0x006fec02 in ReorderBufferRestoreCleanup
(rb=rb@entry=0x1b4c370, txn=txn@entry=0x1b5c3b8) at reorderbuffer.c:2576
#4  0x00700693 in ReorderBufferCleanupTXN (rb=rb@entry=0x1b4c370,
txn=txn@entry=0x1b5c3b8) at reorderbuffer.c:1145
#5  0x00701516 in ReorderBufferAbortOld (rb=0x1b4c370,
oldestRunningXid=558) at reorderbuffer.c:1730
#6  0x006f5a47 in DecodeStandbyOp (ctx=0x1af9ce0,
buf=buf@entry=0x7ffd11761200) at decode.c:325
#7  0x006f65bf in LogicalDecodingProcessRecord (ctx=,
record=) at decode.c:117
#8  0x007098ab in XLogSendLogical () at walsender.c:2766
#9  0x0070a875 in WalSndLoop (send_data=send_data@entry=0x709857
) at walsender.c:2134
#10 0x0070b011 in StartLogicalReplication (cmd=cmd@entry=0x1a9cd68)
at walsender.c:1101
#11 0x0070b46f in exec_replication_command
(cmd_string=cmd_string@entry=0x1afec30 "START_REPLICATION SLOT \"sub\"
LOGICAL 0/0 (proto_version '1', publication_names '\"pub\"')") at
walsender.c:1527
#12 0x00758809 in PostgresMain (argc=,
argv=argv@entry=0x1aab870, dbname=, username=)
at postgres.c:4086
#13 0x006e178d in BackendRun (port=port@entry=0x1aa3430) at
postmaster.c:4357
#14 0x006e35e9 in BackendStartup (port=port@entry=0x1aa3430) at
postmaster.c:4029
#15 0x006e39e3 in ServerLoop () at postmaster.c:1753
#16 0x006e4b36 in PostmasterMain (arg

Re: Failed to delete old ReorderBuffer spilled files

2017-11-21 Thread atorikoshi

Thanks for reviewing!


On 2017/11/21 18:12, Masahiko Sawada wrote:

On Tue, Nov 21, 2017 at 3:48 PM, Masahiko Sawada  wrote:

On Mon, Nov 20, 2017 at 7:35 PM, atorikoshi
 wrote:

Hi,

I put many queries into one transaction and made ReorderBuffer spill
data to disk, and sent SIGKILL to postgres before the end of the
transaction.

After starting up postgres again, I observed the files spilled to
data wasn't deleted.

I think these files should be deleted because its transaction was no
more valid, so no one can use these files.


Below is a reproduction instructions.


1. Create table and publication at publiser.

  @pub =# CREATE TABLE t1(
  id INT PRIMARY KEY,
  name TEXT);

  @pub =# CREATE PUBLICATION pub FOR TABLE t1;

2. Create table and subscription at subscriber.

  @sub =# CREATE TABLE t1(
  id INT PRIMARY KEY,
  name TEXT
  );

  @sub =# CREATE SUBSCRIPTION sub
  CONNECTION 'host=[hostname] port=[port] dbname=[dbname]'
  PUBLICATION pub;

3. Put many queries into one transaction.

  @pub =# BEGIN;
INSERT INTO t1
SELECT
  i,
  'aa'
FROM
generate_series(1, 100) as i;

4. Then we can see spilled files.

  @pub $ ls -1 ${PGDATA}/pg_replslot/sub/
state
xid-561-lsn-0-100.snap
xid-561-lsn-0-200.snap
xid-561-lsn-0-300.snap
xid-561-lsn-0-400.snap
xid-561-lsn-0-500.snap
xid-561-lsn-0-600.snap
xid-561-lsn-0-700.snap
xid-561-lsn-0-800.snap
xid-561-lsn-0-900.snap

5. Kill publisher's postgres process before COMMIT.

  @pub $ kill -s SIGKILL [pid of postgres]

6. Start publisher's postgres process.

  @pub $ pg_ctl start -D ${PGDATA}

7. After a while, we can see the files remaining.
  (Immediately after starting publiser, we can not see these files.)

  @pub $ pg_ctl start -D ${PGDATA}

  When I configured with '--enable-cassert', below assertion error
  was appeared.

TRAP: FailedAssertion("!(txn->final_lsn != 0)", File: "reorderbuffer.c",
Line: 2576)


Attached patch sets final_lsn to the last ReorderBufferChange if
final_lsn == 0.


Thank you for the report. I could reproduce this issue with the above
step. My analysis is, the cause of that a serialized reorder buffer
isn't cleaned up is that the aborted transaction without an abort WAL
record has no chance to set ReorderBufferTXN->final_lsn. So if there
is such serialized transactions ReorderBufferRestoreCleanup cleanups
no files, which is cause of the assertion failure (or a file being
orphaned). What do you think?

On detail of your patch, I'm not sure it's safe if we set the lsn of
other than commit record or abort record to final_lsn. The comment in
reorderbuffer.h says,

typedef trcut ReorderBufferTXN
{
(snip)

/* 
 * LSN of the record that lead to this xact to be committed or
 * aborted. This can be a
 * * plain commit record
 * * plain commit record, of a parent transaction
 * * prepared transaction commit
 * * plain abort record
 * * prepared transaction abort
 * * error during decoding
 * 
 */
XLogRecPtr  final_lsn;

But with your patch, we could set a lsn of a record that is other than
what listed above to final_lsn. One way I came up with is to make


I added some comments on final_lsn.


ReorderBufferRestoreCleanup accept an invalid value of final_lsn and
regards it as a aborted transaction that doesn't has a abort WAL
record. So we can cleanup all serialized files if final_lsn of a
transaction is invalid.


After more thought, since there are some codes cosmetically setting
final_lsn when the fate of transaction is determined possibly we
should not accept a invalid value of final_lsn even in the case.



My new patch keeps setting final_lsn, but changed its location to the
top of ReorderBufferCleanupTXN().
I think it's a kind of preparation, so doing it at the top improves
readability.


Anyway I think you should register this patch to the next commit fest so as not 
forget.


Thanks for you advice, I've registered this issue as a bug.

Regards,

--
Atsushi Torikoshi
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

diff --git a/src/backend/replication/logical/reorderbuffer.c 
b/src/backend/replication/logical/reorderbuffer.c
index 0f607ba..bbeb494 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -1075,6 +1075,21 @@ ReorderBufferCleanupTXN(ReorderBuffer *rb, 
ReorderBufferTXN *txn)
boolfound;
dlist_mutable_iter iter;
 
+   /*
+* Before cleaning up, do a preparation.
+* If this transaction encountered crash during transaction,
+* txn->final_lsn remains initial value.
+* 

Failed to delete old ReorderBuffer spilled files

2017-11-20 Thread atorikoshi

Hi,

I put many queries into one transaction and made ReorderBuffer spill
data to disk, and sent SIGKILL to postgres before the end of the
transaction.

After starting up postgres again, I observed the files spilled to
data wasn't deleted.

I think these files should be deleted because its transaction was no
more valid, so no one can use these files.


Below is a reproduction instructions.


1. Create table and publication at publiser.

  @pub =# CREATE TABLE t1(
  id INT PRIMARY KEY,
  name TEXT);

  @pub =# CREATE PUBLICATION pub FOR TABLE t1;

2. Create table and subscription at subscriber.

  @sub =# CREATE TABLE t1(
  id INT PRIMARY KEY,
  name TEXT
  );

  @sub =# CREATE SUBSCRIPTION sub
  CONNECTION 'host=[hostname] port=[port] dbname=[dbname]'
  PUBLICATION pub;

3. Put many queries into one transaction.

  @pub =# BEGIN;
INSERT INTO t1
SELECT
  i,
  'aa'
FROM
generate_series(1, 100) as i;

4. Then we can see spilled files.

  @pub $ ls -1 ${PGDATA}/pg_replslot/sub/
state
xid-561-lsn-0-100.snap
xid-561-lsn-0-200.snap
xid-561-lsn-0-300.snap
xid-561-lsn-0-400.snap
xid-561-lsn-0-500.snap
xid-561-lsn-0-600.snap
xid-561-lsn-0-700.snap
xid-561-lsn-0-800.snap
xid-561-lsn-0-900.snap

5. Kill publisher's postgres process before COMMIT.

  @pub $ kill -s SIGKILL [pid of postgres]

6. Start publisher's postgres process.

  @pub $ pg_ctl start -D ${PGDATA}

7. After a while, we can see the files remaining.
  (Immediately after starting publiser, we can not see these files.)

  @pub $ pg_ctl start -D ${PGDATA}

  When I configured with '--enable-cassert', below assertion error
  was appeared.

TRAP: FailedAssertion("!(txn->final_lsn != 0)", File:  
"reorderbuffer.c", Line: 2576)



Attached patch sets final_lsn to the last ReorderBufferChange if
final_lsn == 0.

--
Atsushi Torikoshi
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

diff --git a/src/backend/replication/logical/reorderbuffer.c 
b/src/backend/replication/logical/reorderbuffer.c
index 0f607ba..51d5b71 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -1093,6 +1093,20 @@ ReorderBufferCleanupTXN(ReorderBuffer *rb, 
ReorderBufferTXN *txn)
ReorderBufferCleanupTXN(rb, subtxn);
}
 
+   /*
+* If this transaction encountered crash during transaction,
+* txn->final_lsn remains initial value.
+* To properly remove entries which were spilled to disk, we need valid
+* final_lsn.
+* So we set final_lsn to the lsn of the last ReorderBufferChange.
+*/
+   if (txn->final_lsn == 0)
+   {
+   ReorderBufferChange *last_change =
+   dlist_tail_element(ReorderBufferChange, node, &txn->changes);
+   txn->final_lsn = last_change->lsn;
+   }
+
/* cleanup changes in the toplevel txn */
dlist_foreach_modify(iter, &txn->changes)
{