Re: [HACKERS] [PATCH 06/16] Add support for a generic wal reading facility dubbed XLogReader

2012-06-18 Thread Heikki Linnakangas

On 15.06.2012 00:38, Andres Freund wrote:

On Thursday, June 14, 2012 11:19:00 PM Heikki Linnakangas wrote:

On 13.06.2012 14:28, Andres Freund wrote:

Features:
- streaming reading/writing
- filtering
- reassembly of records

Reusing the ReadRecord infrastructure in situations where the code that
wants to do so is not tightly integrated into xlog.c is rather hard and
would require changes to rather integral parts of the recovery code
which doesn't seem to be a good idea.

It would be nice refactor ReadRecord and its subroutines out of xlog.c.
That file has grown over the years to be really huge, and separating the
code to read WAL sounds like it should be a pretty natural split. I
don't want to duplicate all the WAL reading code, so we really should
find a way to reuse that. I'd suggest rewriting ReadRecord into a thin
wrapper that just calls the new xlogreader code.

I aggree that it is not very nice to duplicate it. But I also don't want to go
the route of replacing ReadRecord with it for a while, we can replace
ReadRecord later if we want. As long as it is in flux like it is right now I
don't really see the point in investing energy in it.
Also I am not that sure how a callback oriented API fits into the xlog.c
workflow?


If the API doesn't fit the xlog.c workflow, I think that's a pretty good 
indication that the API is not good. The xlog.c workflow is basically:


while(!end-of-wal)
{
  record = ReadRecord();
  redo(recode);
}

There's some pretty complicated logic within the bowels of ReadRecord 
(see XLogPageRead(), for example), but it seems to me those parts 
actually fit the your callback API pretty well. The biggest change is 
that the xlog-reader thingie should return to the caller after each 
record, instead of calling a callback for each read record and only 
returning at end-of-wal.


Or we could put the code to call the redo-function into the callback, 
but there's some also some logic there to exit early if you reach the 
desired recovery point, for example, so the callback API would need to 
be extended to allow such early exit. I think a return-after-each-record 
API would be better, though.



Can this be used for writing WAL, as well as reading? If so, what do you
need the write support for?

It currently can replace records which are not interesting (e.g. index changes
in the case of logical rep). Filtered records are replaced with XLOG_NOOP
records with correct length currently. In future the actual amount of data
should really be reduced. I don't know yet know how to map LSNs of
uncompressed/compressed stream onto each other...
The filtered data is then passed to a writeout callback (in a streaming
fashion).

The whole writing out part is pretty ugly at the moment and I just bolted it
ontop because it was convenient for the moment. I am not yet sure how the api
for that should look


Can we just leave that out for now?

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH 06/16] Add support for a generic wal reading facility dubbed XLogReader

2012-06-18 Thread Andres Freund
On Monday, June 18, 2012 01:51:45 PM Heikki Linnakangas wrote:
 On 15.06.2012 00:38, Andres Freund wrote:
  On Thursday, June 14, 2012 11:19:00 PM Heikki Linnakangas wrote:
  On 13.06.2012 14:28, Andres Freund wrote:
  Features:
  - streaming reading/writing
  - filtering
  - reassembly of records
  
  Reusing the ReadRecord infrastructure in situations where the code that
  wants to do so is not tightly integrated into xlog.c is rather hard and
  would require changes to rather integral parts of the recovery code
  which doesn't seem to be a good idea.
  
  It would be nice refactor ReadRecord and its subroutines out of xlog.c.
  That file has grown over the years to be really huge, and separating the
  code to read WAL sounds like it should be a pretty natural split. I
  don't want to duplicate all the WAL reading code, so we really should
  find a way to reuse that. I'd suggest rewriting ReadRecord into a thin
  wrapper that just calls the new xlogreader code.
  
  I aggree that it is not very nice to duplicate it. But I also don't want
  to go the route of replacing ReadRecord with it for a while, we can
  replace ReadRecord later if we want. As long as it is in flux like it is
  right now I don't really see the point in investing energy in it.
  Also I am not that sure how a callback oriented API fits into the xlog.c
  workflow?
 
 If the API doesn't fit the xlog.c workflow, I think that's a pretty good
 indication that the API is not good. The xlog.c workflow is basically:
 
 while(!end-of-wal)
 {
record = ReadRecord();
redo(recode);
 }
 
 There's some pretty complicated logic within the bowels of ReadRecord
 (see XLogPageRead(), for example), but it seems to me those parts
 actually fit the your callback API pretty well. The biggest change is
 that the xlog-reader thingie should return to the caller after each
 record, instead of calling a callback for each read record and only
 returning at end-of-wal.
I did it that way at start but it seemed to move too much knowledge to the 
callsites.
Its also something of an efficiency/complexity thing: Either you alway reread 
the current page on entering XLogReader (because it could have changed if it 
was the last one at some point) which is noticeable performancewise or you 
make the contract more complex by requiring to notify the XLogReader about the 
fact that you changed the end-of-valid data which doesn't seem to be a good 
idea because I think we will rather quickly will grow more callsites.

 Or we could put the code to call the redo-function into the callback,
 but there's some also some logic there to exit early if you reach the
 desired recovery point, for example, so the callback API would need to
 be extended to allow such early exit. I think a return-after-each-record
 API would be better, though.
Adding an early exit would be easy.

  Can this be used for writing WAL, as well as reading? If so, what do you
  need the write support for?
  
  It currently can replace records which are not interesting (e.g. index
  changes in the case of logical rep). Filtered records are replaced with
  XLOG_NOOP records with correct length currently. In future the actual
  amount of data should really be reduced. I don't know yet know how to
  map LSNs of uncompressed/compressed stream onto each other...
  The filtered data is then passed to a writeout callback (in a streaming
  fashion).
  
  The whole writing out part is pretty ugly at the moment and I just bolted
  it ontop because it was convenient for the moment. I am not yet sure how
  the api for that should look

 Can we just leave that out for now?
Not easily I think. I should probably get busy writing up a PoC for separating 
that.

Thanks,

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH 06/16] Add support for a generic wal reading facility dubbed XLogReader

2012-06-14 Thread Heikki Linnakangas

On 13.06.2012 14:28, Andres Freund wrote:

Features:
- streaming reading/writing
- filtering
- reassembly of records

Reusing the ReadRecord infrastructure in situations where the code that wants
to do so is not tightly integrated into xlog.c is rather hard and would require
changes to rather integral parts of the recovery code which doesn't seem to be
a good idea.


It would be nice refactor ReadRecord and its subroutines out of xlog.c. 
That file has grown over the years to be really huge, and separating the 
code to read WAL sounds like it should be a pretty natural split. I 
don't want to duplicate all the WAL reading code, so we really should 
find a way to reuse that. I'd suggest rewriting ReadRecord into a thin 
wrapper that just calls the new xlogreader code.



Missing:
- compressing the stream when removing uninteresting records
- writing out correct CRCs
- validating CRCs
- separating reader/writer


- comments.

At a quick glance, I couldn't figure out how this works. There seems to 
be some callback functions? If you want to read an xlog stream using 
this facility, what do you do? Can this be used for writing WAL, as well 
as reading? If so, what do you need the write support for?


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH 06/16] Add support for a generic wal reading facility dubbed XLogReader

2012-06-14 Thread Andres Freund
On Thursday, June 14, 2012 11:19:00 PM Heikki Linnakangas wrote:
 On 13.06.2012 14:28, Andres Freund wrote:
  Features:
  - streaming reading/writing
  - filtering
  - reassembly of records
  
  Reusing the ReadRecord infrastructure in situations where the code that
  wants to do so is not tightly integrated into xlog.c is rather hard and
  would require changes to rather integral parts of the recovery code
  which doesn't seem to be a good idea.
 It would be nice refactor ReadRecord and its subroutines out of xlog.c.
 That file has grown over the years to be really huge, and separating the
 code to read WAL sounds like it should be a pretty natural split. I
 don't want to duplicate all the WAL reading code, so we really should
 find a way to reuse that. I'd suggest rewriting ReadRecord into a thin
 wrapper that just calls the new xlogreader code.
I aggree that it is not very nice to duplicate it. But I also don't want to go 
the route of replacing ReadRecord with it for a while, we can replace 
ReadRecord later if we want. As long as it is in flux like it is right now I 
don't really see the point in investing energy in it.
Also I am not that sure how a callback oriented API fits into the xlog.c 
workflow?

  Missing:
  - compressing the stream when removing uninteresting records
  - writing out correct CRCs
  - validating CRCs
  - separating reader/writer
 
 - comments.
 At a quick glance, I couldn't figure out how this works. There seems to
 be some callback functions? If you want to read an xlog stream using
 this facility, what do you do?
You currently have to fill out 4 callbacks:

XLogReaderStateInterestingCB is_record_interesting;
XLogReaderStateWriteoutCB writeout_data;
XLogReaderStateFinishedRecordCB finished_record;
XLogReaderStateReadPageCB read_page;

As an example how to use it (from the walsender support for 
START_LOGICAL_REPLICATION):

if(!xlogreader_state){
xlogreader_state = XLogReaderAllocate();
xlogreader_state-is_record_interesting = 
RecordRelevantForLogicalReplication;
xlogreader_state-finished_record = ProcessRecord;
xlogreader_state-writeout_data = WriteoutData;
xlogreader_state-read_page = XLogReadPage;

/* startptr is the current XLog position */
xlogreader_state-startptr = startptr;

XLogReaderReset(xlogreader_state);
}

/* how far does valid data go */
xlogreader_state-endptr = endptr;

XLogReaderRead(xlogreader_state);

The last step will then call the above callbacks till it reaches endptr. I.e. 
it first reads a page with read_page; then checks whether a record is 
interesting for the use-case (is_record_interesting); in case it is 
interesting, it gets reassembled and passed to the finished_record callback. 
Then the bytestream gets written out again with writeout_data.

In this case it gets written to the buffer the walsender has allocated. In 
others it might just get thrown away.

 Can this be used for writing WAL, as well as reading? If so, what do you
 need the write support for?
It currently can replace records which are not interesting (e.g. index changes 
in the case of logical rep). Filtered records are replaced with XLOG_NOOP 
records with correct length currently. In future the actual amount of data 
should really be reduced. I don't know yet know how to map LSNs of 
uncompressed/compressed stream onto each other...
The filtered data is then passed to a writeout callback (in a streaming 
fashion).

The whole writing out part is pretty ugly at the moment and I just bolted it 
ontop because it was convenient for the moment. I am not yet sure how the api 
for that should look

Andres

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH 06/16] Add support for a generic wal reading facility dubbed XLogReader

2012-06-14 Thread Andres Freund
On Thursday, June 14, 2012 11:19:00 PM Heikki Linnakangas wrote:
 On 13.06.2012 14:28, Andres Freund wrote:
  Features:
  - streaming reading/writing
  - filtering
  - reassembly of records
  
  Reusing the ReadRecord infrastructure in situations where the code that
  wants to do so is not tightly integrated into xlog.c is rather hard and
  would require changes to rather integral parts of the recovery code
  which doesn't seem to be a good idea.
 
 It would be nice refactor ReadRecord and its subroutines out of xlog.c.
 That file has grown over the years to be really huge, and separating the
 code to read WAL sounds like it should be a pretty natural split. I
 don't want to duplicate all the WAL reading code, so we really should
 find a way to reuse that. I'd suggest rewriting ReadRecord into a thin
 wrapper that just calls the new xlogreader code.
 
  Missing:
  - compressing the stream when removing uninteresting records
  - writing out correct CRCs
  - validating CRCs
  - separating reader/writer
 
 - comments.
 
 At a quick glance, I couldn't figure out how this works. There seems to
 be some callback functions? If you want to read an xlog stream using
 this facility, what do you do? Can this be used for writing WAL, as well
 as reading? If so, what do you need the write support for?
Oh, btw, the callbacks and parameters are somewhat documented in the 
xlogreader.h header in the XLogReaderState struct.
Still needs improvement though.

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [PATCH 06/16] Add support for a generic wal reading facility dubbed XLogReader

2012-06-13 Thread Andres Freund
From: Andres Freund and...@anarazel.de

Features:
- streaming reading/writing
- filtering
- reassembly of records

Reusing the ReadRecord infrastructure in situations where the code that wants
to do so is not tightly integrated into xlog.c is rather hard and would require
changes to rather integral parts of the recovery code which doesn't seem to be
a good idea.

Missing:
- compressing the stream when removing uninteresting records
- writing out correct CRCs
- validating CRCs
- separating reader/writer
---
 src/backend/access/transam/Makefile |2 +-
 src/backend/access/transam/xlogreader.c |  914 +++
 src/include/access/xlogreader.h |  173 ++
 3 files changed, 1088 insertions(+), 1 deletion(-)
 create mode 100644 src/backend/access/transam/xlogreader.c
 create mode 100644 src/include/access/xlogreader.h

diff --git a/src/backend/access/transam/Makefile 
b/src/backend/access/transam/Makefile
index f82f10e..660b5fc 100644
--- a/src/backend/access/transam/Makefile
+++ b/src/backend/access/transam/Makefile
@@ -13,7 +13,7 @@ top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
 OBJS = clog.o transam.o varsup.o xact.o rmgr.o slru.o subtrans.o multixact.o \
-   twophase.o twophase_rmgr.o xlog.o xlogfuncs.o xlogutils.o
+   twophase.o twophase_rmgr.o xlog.o xlogfuncs.o xlogreader.o xlogutils.o
 
 include $(top_srcdir)/src/backend/common.mk
 
diff --git a/src/backend/access/transam/xlogreader.c 
b/src/backend/access/transam/xlogreader.c
new file mode 100644
index 000..6f15d66
--- /dev/null
+++ b/src/backend/access/transam/xlogreader.c
@@ -0,0 +1,914 @@
+/*-
+ *
+ * xlogreader.c
+ *
+ * Aa somewhat generic xlog read interface
+ *
+ * Portions Copyright (c) 2010-2012, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *   src/backend/access/transam/readxlog.c
+ *
+ *-
+ *
+ * FIXME:
+ * * CRC computation
+ * * separation of reader/writer
+ */
+
+#include postgres.h
+
+#include access/xlog_internal.h
+#include access/transam.h
+#include catalog/pg_control.h
+#include access/xlogreader.h
+
+/* FIXME */
+#include replication/walsender_private.h
+#include replication/walprotocol.h
+
+//#define VERBOSE_DEBUG
+
+XLogReaderState* XLogReaderAllocate(void)
+{
+   XLogReaderState* state = 
(XLogReaderState*)malloc(sizeof(XLogReaderState));
+   int i;
+
+   if (!state)
+   goto oom;
+
+   memset(state-buf.record, 0, sizeof(XLogRecord));
+   state-buf.record_data_size = XLOG_BLCKSZ*8;
+   state-buf.record_data =
+   malloc(state-buf.record_data_size);
+
+   if (!state-buf.record_data)
+   goto oom;
+
+   if (!state)
+   goto oom;
+
+   for (i = 0; i  XLR_MAX_BKP_BLOCKS; i++)
+   {
+   state-buf.bkp_block_data[i] =
+   malloc(BLCKSZ);
+
+   if (!state-buf.bkp_block_data[i])
+   goto oom;
+   }
+   XLogReaderReset(state);
+   return state;
+
+oom:
+   elog(ERROR, could not allocate memory for XLogReaderState);
+   return 0;
+}
+
+void XLogReaderReset(XLogReaderState* state)
+{
+   state-in_record = false;
+   state-in_bkp_blocks = 0;
+   state-in_bkp_block_header = false;
+   state-in_skip = false;
+   state-remaining_size = 0;
+   state-nbytes = 0;
+   state-incomplete = false;
+   state-initialized = false;
+   state-needs_input = false;
+   state-needs_output = false;
+}
+
+static inline bool
+XLogReaderHasInput(XLogReaderState* state, Size size)
+{
+   XLogRecPtr tmp = state-curptr;
+   XLByteAdvance(tmp, size);
+   if (XLByteLE(state-endptr, tmp))
+   return false;
+   return true;
+}
+
+static inline bool
+XLogReaderHasOutput(XLogReaderState* state, Size size){
+   if (state-nbytes + size  MAX_SEND_SIZE)
+   return false;
+   return true;
+}
+
+static inline bool
+XLogReaderHasSpace(XLogReaderState* state, Size size)
+{
+   XLogRecPtr tmp = state-curptr;
+   XLByteAdvance(tmp, size);
+   if (XLByteLE(state-endptr, tmp))
+   return false;
+   else if (state-nbytes + size  MAX_SEND_SIZE)
+   return false;
+   return true;
+}
+
+void
+XLogReaderRead(XLogReaderState* state)
+{
+   XLogRecord* temp_record;
+
+   state-needs_input = false;
+   state-needs_output = false;
+
+   /*
+* Do some basic sanity checking and setup if were starting anew.
+*/
+   if (!state-initialized)
+   {
+   state-initialized = true;
+   /*
+* we need to start reading at the beginning of the page to 
understand
+* what we are currently reading. We will skip over that 
because we
+* check curptr  startptr