Re: [HACKERS] Block write statistics WIP

2013-07-05 Thread Greg Smith

On 7/1/13 3:10 AM, Satoshi Nagayasu wrote:

Or should we add some pointer, which is accociated with the Relation,
into BufferDesc? Maybe OID?


That is the other option here, I looked at it but didn't like it.  The 
problem is that at the point when a new page is created, it's not always 
clear yet what relation it's going to hold.  That means that if the 
buffer page itself is where you look up the relation OID, every code 
path that manipulates relation IDs will need to worry about maintaining 
this information.  It's possible to do it that way, but I don't know 
that it's any less work than making all the write paths keep track of 
it.  It would need some extra locking around updating the relation tag 
in the buffer pages too, and I'd prefer not to


The other thing that I like about the writing side is that I can 
guarantee the code is correct pretty easily.  Yes, it's going to take 
days worth of modifying writing code.  But the compile will force you to 
fix all of them, and once they're all updated I'd be surprised if 
something unexpected happened.


If instead the data moves onto the buffer page header instead, we could 
be chasing bugs similar to the relcache ones forever.  Also, as a tie 
breaker, buffer pages will get bigger and require more locking this way. 
 Making writers tell us the relation doesn't need any of that.



I'm thinking of FlushBuffer() too. How can we count physical write
for each relation in FlushBuffer()? Or any other appropriate place?


SMgrRelation is available there, so tagging the relation should be easy 
in this case.



BTW, Greg's first patch could not be applied to the latest HEAD.
I guess it need to have some fix, I couldn't clafiry it though.


Since this is a WIP patch, what I was looking for was general design 
approach feedback, with my bit as a simple example.  I didn't expect 
anyone to spend much time doing a formal review of that quick hack. 
You're welcome to try and play with that code to add things, I'm not 
very attached to it yet.  I've basically gotten what I wanted to here 
from this submission; I'm marking it returned with feedback.  If anyone 
else has comments, I'm still open to discussion here too.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Block write statistics WIP

2013-07-01 Thread Satoshi Nagayasu

Hi,

I'm looking into this patch as a reviewer.

(2013/05/24 10:33), Heikki Linnakangas wrote:

On 23.05.2013 19:10, Greg Smith wrote:

On 5/20/13 7:51 AM, Heikki Linnakangas wrote:
It strikes me as kind of weird that the read side and write side are
not more symmetrical.


It might seem weird if you expect the API to be similar to POSIX read()
and write(), where you can read() an arbitrary block at any location,
and write() an arbitrary block at any location. A better comparison
would be e.g open() and close(). open() returns a file descriptor, which
you pass to other functions to operate on the file. When you're done,
you call close(fd). The file descriptor is completely opaque to the user
program, you do all the operations through the functions that take the
fd as argument. Similarly, ReadBuffer() returns a Buffer, which is
completely opaque to the caller, and you do all the operations through
various functions and macros that operate on the Buffer. When you're
done, you release the buffer with ReleaseBuffer().

(sorry for the digression from the original topic, I don't have any
problem with what adding an optional Relation argument to MarkBuffer if
you need that :-) )


Or should we add some pointer, which is accociated with the Relation,
into BufferDesc? Maybe OID?

I'm thinking of FlushBuffer() too. How can we count physical write
for each relation in FlushBuffer()? Or any other appropriate place?

BTW, Greg's first patch could not be applied to the latest HEAD.
I guess it need to have some fix, I couldn't clafiry it though.

Regards,
--
Satoshi Nagayasu sn...@uptime.jp
Uptime Technologies, LLC. http://www.uptime.jp


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Block write statistics WIP

2013-05-23 Thread Greg Smith

On 5/20/13 7:51 AM, Heikki Linnakangas wrote:

The way that MarkDirty requires this specific underlying storage manager
behavior to work properly strikes me as as a bit of a layering violation
too. I'd like the read and write paths to have a similar API, but here
they don't even operate on the same type of inputs. Addressing that is
probably harder than just throwing a hack on the existing code though.


To be honest, I don't understand what you mean by that. ?


Let's say you were designing a storage layer API from scratch for what 
Postgres does.  That might take a relation as its input, like ReadBuffer 
does.  Hiding the details of how that turns into a physical file 
operation would be a useful goal of such a layer.  I'd then consider it 
a problem if that exposed things like the actual mapping of relations 
into files to callers.


What we actually have right now is this MarkDirty function that operates 
on BufferTag data, which points directly to the underlying file.  I see 
that as cutting the storage API in half and calling a function in the 
middle of the implementation.  It strikes me as kind of weird that the 
read side and write side are not more symmetrical.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Block write statistics WIP

2013-05-23 Thread Heikki Linnakangas

On 23.05.2013 19:10, Greg Smith wrote:

On 5/20/13 7:51 AM, Heikki Linnakangas wrote:

The way that MarkDirty requires this specific underlying storage
manager behavior to work properly strikes me as as a bit of a
layering violation too. I'd like the read and write paths to have
a similar API, but here they don't even operate on the same type
of inputs. Addressing that is probably harder than just throwing
a hack on the existing code though.


To be honest, I don't understand what you mean by that. ?


Let's say you were designing a storage layer API from scratch for
what Postgres does. That might take a relation as its input, like
ReadBuffer does. Hiding the details of how that turns into a physical
file operation would be a useful goal of such a layer. I'd then
consider it a problem if that exposed things like the actual mapping
of relations into files to callers.


Ok, got it.


What we actually have right now is this MarkDirty function that
operates on BufferTag data, which points directly to the underlying
file. I see that as cutting the storage API in half and calling a
function in the middle of the implementation.


Well, no, the BufferTag struct is internal to the buffer manager 
implementation. It's not part of the API; it's an implementation detail 
of the buffer manager.



It strikes me as kind of weird that the read side and write side are
not more symmetrical.


It might seem weird if you expect the API to be similar to POSIX read() 
and write(), where you can read() an arbitrary block at any location, 
and write() an arbitrary block at any location. A better comparison 
would be e.g open() and close(). open() returns a file descriptor, which 
you pass to other functions to operate on the file. When you're done, 
you call close(fd). The file descriptor is completely opaque to the user 
program, you do all the operations through the functions that take the 
fd as argument. Similarly, ReadBuffer() returns a Buffer, which is 
completely opaque to the caller, and you do all the operations through 
various functions and macros that operate on the Buffer. When you're 
done, you release the buffer with ReleaseBuffer().


(sorry for the digression from the original topic, I don't have any 
problem with what adding an optional Relation argument to MarkBuffer if 
you need that :-) )


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Block write statistics WIP

2013-05-20 Thread Heikki Linnakangas

On 19.05.2013 11:15, Greg Smith wrote:

I've thought of two paths to get a block write count out of that so far:

-Provide a function to find the Relation from the RelFileNode. There is
a warning about the perils of assuming you can map that way from a
buftag value in buf_internals.h though:

Note: the BufferTag data must be sufficient to determine where to write
the block, without reference to pg_class or pg_tablespace entries. It's
possible that the backend flushing the buffer doesn't even believe the
relation is visible yet (its xact may have started before the xact that
created the rel). The storage manager must be able to cope anyway.

-Modify every caller of MarkDirty* to include a relation when that
information is available. There are about 200 callers of those functions
around, so that won't be fun. I noted that many of them are in the XLog
replay functions, which won't have the relation--but those aren't
important to count anyway.

Neither of these options feels very good to me, so selecting between the
two feels like picking the lesser of two evils. I'm fine with chugging
through all of the callers to modify MarkDirty*, but I didn't want to do
that only to have the whole approach rejected as wrong. That's why I
stopped here to get some feedback.


Adding a Relation argument to all the Mark* calls seems fine to me. I 
don't find it ugly at all. The biggest objection would be that if there 
are extensions out there that call those functions, they would need to 
be changed, but I think that's fine.



The way that MarkDirty requires this specific underlying storage manager
behavior to work properly strikes me as as a bit of a layering violation
too. I'd like the read and write paths to have a similar API, but here
they don't even operate on the same type of inputs. Addressing that is
probably harder than just throwing a hack on the existing code though.


To be honest, I don't understand what you mean by that. ?

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Block write statistics WIP

2013-05-19 Thread Greg Smith
I have some time now for working on the mystery of why there are no 
block write statistics in the database.  I hacked out the statistics 
collection and reporting side already, rough patch attached if you want 
to see the boring parts.


I guessed that there had to be a gotcha behind why this wasn't done 
before now, and I've found one so far.  All of the read statistics are 
collected with a macro that expects to know a Relation number.  Callers 
to ReadBuffer pass one.  On the write side, the two places that 
increment the existing write counters (pg_stat_statements, vacuum) are 
MarkBufferDirty and MarkBufferDirtyHint.  Neither of those is given a 
Relation though.  Inspecting the Buffer passed can only find the buffer 
tag's RelFileNode.


I've thought of two paths to get a block write count out of that so far:

-Provide a function to find the Relation from the RelFileNode.  There is 
a warning about the perils of assuming you can map that way from a 
buftag value in buf_internals.h though:


Note: the BufferTag data must be sufficient to determine where to write 
the block, without reference to pg_class or pg_tablespace entries.  It's 
possible that the backend flushing the buffer doesn't even believe the 
relation is visible yet (its xact may have started before the xact that 
created the rel).  The storage manager must be able to cope anyway.


-Modify every caller of MarkDirty* to include a relation when that 
information is available.  There are about 200 callers of those 
functions around, so that won't be fun.  I noted that many of them are 
in the XLog replay functions, which won't have the relation--but those 
aren't important to count anyway.


Neither of these options feels very good to me, so selecting between the 
two feels like picking the lesser of two evils.  I'm fine with chugging 
through all of the callers to modify MarkDirty*, but I didn't want to do 
that only to have the whole approach rejected as wrong.  That's why I 
stopped here to get some feedback.


The way that MarkDirty requires this specific underlying storage manager 
behavior to work properly strikes me as as a bit of a layering violation 
too.  I'd like the read and write paths to have a similar API, but here 
they don't even operate on the same type of inputs.  Addressing that is 
probably harder than just throwing a hack on the existing code though.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com
diff --git a/src/backend/catalog/system_views.sql 
b/src/backend/catalog/system_views.sql
index a03bfa6..1773d59 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -467,15 +467,19 @@ CREATE VIEW pg_statio_all_tables AS
 pg_stat_get_blocks_fetched(C.oid) -
 pg_stat_get_blocks_hit(C.oid) AS heap_blks_read,
 pg_stat_get_blocks_hit(C.oid) AS heap_blks_hit,
+pg_stat_get_blocks_dirtied(C.oid) AS heap_blks_dirtied,
 sum(pg_stat_get_blocks_fetched(I.indexrelid) -
 pg_stat_get_blocks_hit(I.indexrelid))::bigint AS 
idx_blks_read,
 sum(pg_stat_get_blocks_hit(I.indexrelid))::bigint AS idx_blks_hit,
+sum(pg_stat_get_blocks_dirtied(I.indexrelid))::bigint AS 
idx_blks_dirtied,
 pg_stat_get_blocks_fetched(T.oid) -
 pg_stat_get_blocks_hit(T.oid) AS toast_blks_read,
 pg_stat_get_blocks_hit(T.oid) AS toast_blks_hit,
+pg_stat_get_blocks_dirtied(T.oid) AS toast_blks_dirtied,
 pg_stat_get_blocks_fetched(X.oid) -
 pg_stat_get_blocks_hit(X.oid) AS tidx_blks_read,
 pg_stat_get_blocks_hit(X.oid) AS tidx_blks_hit
+pg_stat_get_blocks_dirted(X.oid) AS tidx_blks_dirtied
 FROM pg_class C LEFT JOIN
 pg_index I ON C.oid = I.indrelid LEFT JOIN
 pg_class T ON C.reltoastrelid = T.oid LEFT JOIN
@@ -530,6 +534,7 @@ CREATE VIEW pg_statio_all_indexes AS
 pg_stat_get_blocks_fetched(I.oid) -
 pg_stat_get_blocks_hit(I.oid) AS idx_blks_read,
 pg_stat_get_blocks_hit(I.oid) AS idx_blks_hit
+pg_stat_get_blocks_dirtied(I.oid) AS idx_blks_dirtied
 FROM pg_class C JOIN
 pg_index X ON C.oid = X.indrelid JOIN
 pg_class I ON I.oid = X.indexrelid
@@ -554,6 +559,7 @@ CREATE VIEW pg_statio_all_sequences AS
 pg_stat_get_blocks_fetched(C.oid) -
 pg_stat_get_blocks_hit(C.oid) AS blks_read,
 pg_stat_get_blocks_hit(C.oid) AS blks_hit
+pg_stat_get_blocks_dirtied(C.oid) AS blks_dirtied
 FROM pg_class C
 LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
 WHERE C.relkind = 'S';
@@ -622,6 +628,7 @@ CREATE VIEW pg_stat_database AS
 pg_stat_get_db_blocks_fetched(D.oid) -
 pg_stat_get_db_blocks_hit(D.oid) AS