Re: [HACKERS] a question about relkind of RelationData handed over to heap_update function

노홍찬 Mon, 26 Oct 2009 18:35:46 -0700

Dear Greg Stark,

Totally, right. I want to record the all updated region.
So, doing some work is not doing a little work.

But, I am trying to not touch the existing codes as much as I can.
Therefore, I mostly added my code, I didn't changed markDirtyBuffer function at 
all, but, of course, I have created a function that is supposed to work 
similarly to what you mentioned.

I am sorry that I couldn't understand the following sentence's meaning (The 
"some more work" may be some function call which doesn't usually do much 
either.).
What did you mean in that sentence? Please excuse my poor English 
understanding, and it would be great if you can explain the meaning more again.

Until now, it's like this, I have appended several fields to BufferDesc 
structure, and my own structure (IclNewLog) is used for recording those dirty 
regions.

------------ ------------ ------------ ------------ ------------ ------------ 
------------ ------------ ------------ ------------ ------------ ------------

typedef struct sbufdesc {
        BufferTag tag; /* ID of page contained in buffer */
        BufFlags flags; /* see bit definitions above */
        uint16 usage_count; /* usage counter for clock sweep code */
        unsigned refcount; /* # of backends holding pins on buffer */
        int wait_backend_pid; /* backend PID of pin-count waiter */

        slock_t buf_hdr_lock; /* protects the above fields */

        int buf_id; /* buffer's index number (from 0) */
        int freeNext; /* link in freelist chain */

        LWLockId io_in_progress_lock; /* to wait for I/O to complete */
        LWLockId content_lock; /* to lock access to buffer contents */

        /* hongs added */
#ifdef USE_ICL
        bool isBufferPageNewOrXlogRead;
        int     icl_length;
        IclNewLog icl_logs[ICL_LEN_LIMIT];
#endif
        /* hongs added */

} BufferDesc;

typedef struct IclNewLog {
        int change_start;
        int change_end;
        uint32 file; //for ICL_DEBUG
        int line; //for ICL_DEBUG
        int icl_log_global_seq; //for ICL_DEBUG
} IclNewLog;

------------ ------------ ------------ ------------ ------------ ------------ 
------------ ------------ ------------ ------------ ------------ ------------

* a part of heap_update function * 

Line number: 2761: oldtup.t_data->t_ctid = heaptup->t_self;

/* hongs added; ICL logs oldtuple's tupleheader */
#ifdef USE_ICL
        if(doIcl) {
                LockBufHdr(bufHdr);     //buffer header lock and buffer content 
lock is separate, so I guess the buffer header lock is needed
                if(bufHdr->icl_length < ICL_LEN_LIMIT-1) {
                        bufHdr->icl_logs[bufHdr->icl_length].change_start = 
lp->lp_off;
                        bufHdr->icl_logs[bufHdr->icl_length].change_end = 
lp->lp_off + sizeof(HeapTupleHeaderData);
                        bufHdr->icl_logs[bufHdr->icl_length].file = HEAPAM;
                        bufHdr->icl_logs[bufHdr->icl_length].line = 3003;
                        IclAssert( 
IsIclLogValid(bufHdr->icl_logs[bufHdr->icl_length]) );       //making sure of 
the correctness of the logsize
                        bufHdr->icl_length++;
                }
                UnlockBufHdr(bufHdr);
        }
#endif
/* hongs added end */

Line number: 2762:      if (newbuf != buffer)
Line number: 2763:      MarkBufferDirty(newbuf);
Line number: 2764:      MarkBufferDirty(buffer);
------------ ------------ ------------ ------------ ------------ ------------ 
------------ ------------ ------------ ------------ ------------ ------------

I named the log "icl log".
The above code is recording "the update to the old tuple's tuple header" into 
the log array field of the buffer descriptor whose buffer page is supposed to 
be marked dirty.

I'm not interested in the buffers frequently updated. I'm interested in the 
buffers to be flushed having very small amount of genuine update areas.
Since, pgsql's update policy uses MVCC time-shapshot model, so every update 
causes the update of old tuple's header (changing the xmax field of it).
There might be some buffer pages to be flushed which have only one or two small 
regions of genuine updates like updated xmax field or updated XLogRecPtr.
I think, purely in my opinion, those flush operations that have small amount of 
genuine update regions are inefficient. 
However, it's not the only problems of pgsql, though. The in-place update 
operations of every DBMS have similar problems.
I think pgsql's update logic is less problematic than others,
since the main updates (not old tuple's header update but the real tuples) 
could be piled up in a buffer page (not in scattered pages), 
and the hot-update mechanism addresses the previous problems of time-snapshot 
MVCC well in pgsql.

Therefore, I limited the maximum log array size as 20. If I apply some log 
merge logic (cuz there would be many logs which can be merged together like 
8152 ~ 8172 and 8162 ~ 8192 -> 8151 ~ 8192)
, then the array size would be enough to locate the buffers having small 
genuine update regions. I don't care about the buffers which has logs more than 
the maximum log array size.

It's an example, current codes doesn't look like this. 
I am trying to not touch the previous codes but only append my logic, so that 
later my code can be patched as an additional module for specific purpose like 
flash based storage.

I want to emphasize this once more, this attempt is not for the pgsql patch or 
pgsql enhancement but for my own research purpose, at least for now.
Besides, this try is just a preparation for my research idea to be implemented.
Therefore, if you see much of inefficiency and stupidness in this try, please 
understand that. 
Later, when I am confident to show the total picture of my idea and working 
codes (at least after passing through the regression test and my own tests 
using dbt2-benchmark),
I'll present it to you, and hackers.

I really thank your interest in my try.

For the original query, I found my mistake. I confused relation oid with 
relNode (of relFileNode). Sorry for the hasty question.

Thank you for reading this.

- Best Regards
  Hongchan Roh -

-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Greg Stark
Sent: Tuesday, October 27, 2009 2:22 AM
To: 노홍찬
Cc: [email protected]
Subject: Re: a question about relkind of RelationData handed over to 
heap_update function

On Sun, Oct 25, 2009 at 9:37 AM, 노홍찬 <[email protected]> wrote:
> What I am trying to do now is to examine the real dirty portion of buffer 
> pages to be flushed like the following.
>
>   page 1
> -------------
> |           |   dportion1 (real dirty portion 1) ranges between 20 ~ 80
> | dportion1 |
> |           |   dportion2 (real dirty portion 2) ranges between 8190 ~ 8192
> |           |
> | dportion2 |
> -------------
>
> Since there are many different kinds of page-updates such as updates to local 
> buffer, temp relation, indexes, toasted attributes, and so forth.
>
> It would be a big burden to me if I inspect all that codes.
>
> Therefore, I decided to make a start point as inspecting only updates to the 
> ordinary tables.
>
> I added a log array field to BufferDesc struct, and added logs to the 
> designated bufferDesc of the updated buffer
>
> when it comes to ordinary table updates (The logs specifies the real dirty 
> portion ranges of the buffer).
>

I would think you would want to modify MarkBufferDirty to take a start
and end point and store that in your log. Then modify every existing
MarkBufferDirty operation that you can to specify the range that the
subsequent operation is going to modify. You're going to run into
problems where you have code which looks like:

 - mark buffer dirty
 - do some work which modifies a predictable portion
 - if (some rare condition)
    - do some more work which modifies other parts of the buffer

The "some more work" may be some function call which doesn't usually
do much either.

So you may end up having to restructure a lot of code so that every
function is responsible for marking the buffer range dirty itself
instead of assuming it's already been marked.

-- 
greg

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] a question about relkind of RelationData handed over to heap_update function

Reply via email to