Re: [HACKERS] New statistics for WAL buffer dirty writes

2012-07-29 Thread Simon Riggs
On 7 July 2012 18:06, Robert Haas  wrote:

> Sure, but I doubt that would be as informative as this.  It's no big deal if 
> you hit 100% every once in a while; what you really want to know is whether 
> it's happening once per second or once per week.

Agreed.

I can't see an easy way of recording the high water mark % and I'm not
sure how we'd use it if we had it.

Let's just track how often we run out of space because that is when
bad things happen, not before.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: Prevent restored WAL files from being archived again Re: Unnecessary WAL archiving after failover

2012-07-29 Thread Fujii Masao
On Mon, Jul 30, 2012 at 12:01 AM, Fujii Masao  wrote:
> On Tue, Jun 5, 2012 at 3:37 PM, Noah Misch  wrote:
>> On Fri, Mar 23, 2012 at 11:03:27PM +0900, Fujii Masao wrote:
>>> (2) WAL files which were restored from the archive
>>>
>>> In 9.1 or before, the restored WAL files don't remain after failover
>>> because they are always restored onto the temporary filename
>>> "RECOVERYXLOG". So the issue which I explain from now doesn't exist
>>> in 9.1 or before.
>>>
>>> In 9.2dev, as the result of supporting cascade replication,
>>> an archived WAL file is restored onto correct file name so that
>>> cascading walsender can send it to another standby. This restored
>>
>> The documentation still says this:
>>
>>   WAL segments that cannot be found in the archive will be sought in 
>> pg_xlog/;
>>   this allows use of recent un-archived segments. However, segments that are
>>   available from the archive will be used in preference to files in
>>   pg_xlog/. The system will not overwrite the existing contents of pg_xlog/
>>   when retrieving archived files.
>>
>> I gather the last sentence is now false?
>
> Yes. Attached patch removes that sentence.
>
>>> WAL file has neither .ready nor .done archive status file. After
>>> failover, checkpoint checks the archive status file of the restored
>>> WAL file to attempt to recycle it, finds that it has neither .ready
>>> nor ,done, and creates .ready. Because of existence of .ready,
>>> it will be archived again even though it obviously already exists in
>>> the archival storage :(
>>>
>>> To prevent a restored WAL file from being archived again, I think
>>> that .done should be created whenever WAL file is successfully
>>> restored (of course this should happen only when archive_mode is
>>> enabled). Thought?
>>
>> Your proposed fix makes sense, and I cannot think of any disadvantage.
>> Concerning only doing it when archive_mode=on, would there ever be a case
>> where a segment is restored under archive_mode=off, then the server restarted
>> with archive_mode=on and an archival attempted on that segment?
>
> Yes, .done file should be created even if archive mode is not enabled.
>
> Attached patch changes the startup process so that it creates .done file
> whenever WAL file is successfully restored, whether archive mode is
> enabled or not. The restored WAL files will not be archived again because
> of .done file.

This patch can be applied cleanly for HEAD, but not in REL9_2_STABLE.
So here is the patch for REL9_2_STABLE.

Regards,

-- 
Fujii Masao


dont_archive_restored_walfile_for_REL9_2_STABLE_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Prevent restored WAL files from being archived again Re: Unnecessary WAL archiving after failover

2012-07-29 Thread Fujii Masao
On Tue, Jun 5, 2012 at 3:37 PM, Noah Misch  wrote:
> On Fri, Mar 23, 2012 at 11:03:27PM +0900, Fujii Masao wrote:
>> (2) WAL files which were restored from the archive
>>
>> In 9.1 or before, the restored WAL files don't remain after failover
>> because they are always restored onto the temporary filename
>> "RECOVERYXLOG". So the issue which I explain from now doesn't exist
>> in 9.1 or before.
>>
>> In 9.2dev, as the result of supporting cascade replication,
>> an archived WAL file is restored onto correct file name so that
>> cascading walsender can send it to another standby. This restored
>
> The documentation still says this:
>
>   WAL segments that cannot be found in the archive will be sought in pg_xlog/;
>   this allows use of recent un-archived segments. However, segments that are
>   available from the archive will be used in preference to files in
>   pg_xlog/. The system will not overwrite the existing contents of pg_xlog/
>   when retrieving archived files.
>
> I gather the last sentence is now false?

Yes. Attached patch removes that sentence.

>> WAL file has neither .ready nor .done archive status file. After
>> failover, checkpoint checks the archive status file of the restored
>> WAL file to attempt to recycle it, finds that it has neither .ready
>> nor ,done, and creates .ready. Because of existence of .ready,
>> it will be archived again even though it obviously already exists in
>> the archival storage :(
>>
>> To prevent a restored WAL file from being archived again, I think
>> that .done should be created whenever WAL file is successfully
>> restored (of course this should happen only when archive_mode is
>> enabled). Thought?
>
> Your proposed fix makes sense, and I cannot think of any disadvantage.
> Concerning only doing it when archive_mode=on, would there ever be a case
> where a segment is restored under archive_mode=off, then the server restarted
> with archive_mode=on and an archival attempted on that segment?

Yes, .done file should be created even if archive mode is not enabled.

Attached patch changes the startup process so that it creates .done file
whenever WAL file is successfully restored, whether archive mode is
enabled or not. The restored WAL files will not be archived again because
of .done file.

Regards,

-- 
Fujii Masao


dont_archive_restored_walfile_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Adding probes for smgr

2012-07-29 Thread Satoshi Nagayasu

(2012/07/29 12:14), Tom Lane wrote:
> Peter Geoghegan  writes:
>> On 28 July 2012 17:15, Tom Lane  wrote:
>>> IMV smgr is pretty vestigial.  I wouldn't recommend loading more
>>> functionality onto that layer, because it's as likely as not that
>>> we'll just get rid of it someday.
> 
>> Agreed. I recently found myself reading a paper written by Stonebraker
>> back in the Berkeley days:
>> http://dislab2.hufs.ac.kr/dislab/seminar/2007/ERL-M87-06.pdf
>> This paper appears to have been published in about 1988, and it shows.
>> It's fairly obvious from reading the opening paragraph that the
>> original rationale for the design of the storage manager doesn't hold
>> these days. Of course, it's also obvious from reading the code, since
>> for example there is only one storage manager module.
> 
> Yeah.  There were actually two storage managers in what we inherited
> from Berkeley, but we soon got rid of the other one as being useless.
> (IIRC it was meant for magnetic-core memory ... anybody seen any of
> that lately?)

I remember that I had found mm.c when I started looking around the
storage manager code, maybe a decade ago. It seemed very interesting,
but I've never seen "magnetic-core memory" itself. :)

Anyway, I'm realizing that it's reasonable to add new probes into the
md module at this time.

Thanks for the comments.

>
> I think basically what happened since then is that the functionality
> Stonebraker et al imagined as being in per-storage-manager code all
> migrated into the kernel device drivers, or even down into the hardware
> itself.  (SSDs are *way* smarter than the average '80s storage device,
> and even those were an order of magnitude smarter than what they'd been
> ten years previously.  I used to do device drivers back in the 80's...)
> There's no longer any good reason to have anything but md.c, which isn't
> so much a "magnetic disk" interface as an "interface to something that
> has a Unix block device driver".

BTW, I'm still interested in keeping the storage manager architecture
"theoretically" pluggable. From my point of view, current pluggable
architecture was designed to serve for different storage manager with
by-passing the posix file system API.

I agree with that most of recent storage devices can be managed under
ordinally file systems, ext3/ext4 or xfs for examples.

However, I'm still curious to see new project which will intend to
extend storage manager to earn more performance for some special
storage devices by by-passing ordinally file system API.

For example, Fusion-IO is offering dedicated API for their PCI-Express
Flash storage, and I guess the database performace could tremendously
benefit from it. (I accept that it's just my curious though. :)

It's just a possible option, but keeping the storage architecture
pluggable does make sense to me even if the community could give only
one storage manager.

Regards,

> 
>> This state of affairs sort of reminds me of mcxt.c . The struct
>> MemoryContextData is described as an "abstract type" that can have
>> multiple implementations, despite the fact that since 2000 (and
>> perhaps earlier), the underlying type is invariably AllocSetContext. I
>> never investigated if that indirection still needs to exist, but I
>> suspect that it too is a candidate for refactoring. Do you agree?
> 
> Meh.  Having invented the MemoryContext interface, I am probably not
> the best-qualified person to evaluate it objectively.  The original
> thought was that we might have (a) a context type that could allocate
> storage in shared memory, and/or (b) a context type that could provide
> better allocation speed at a loss of storage efficiency (eg, lose the
> ability to pfree individual chunks).  Case (a) has never become
> practical given the inability of SysV-style shared memory to expand at
> all.  I don't know if that might change when/if we switch to some other
> shmem API.  The idea of a different allocation strategy for some usages
> still seems like something we'll want to do someday, though.
> 
>   regards, tom lane
> 


-- 
Satoshi Nagayasu 
Uptime Technologies, LLC. http://www.uptime.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers