>>> On Wed, Sep 26, 2007 at 3:14 PM, in message <[EMAIL PROTECTED]>, Simon Riggs <[EMAIL PROTECTED]> wrote: > > It's nicely written Thanks. I spent some time looking at Tom Lane's pg_resetxlog and the source code for cat to model my code. I'm rather rusty on C, so I wanted to minimize the chance of doing anything outrageously stupid. Should I be including anything in the comments to give credit for that? (I'm never sure where the line is on that.) > and looks like it would perform well. In my tests so far, it is faster to pipe through this and then gzip than to just gzip, except when the WAL file is full or nearly so. In tests with small counties (which rarely fill a file except at peak periods), I'm seeing archival WAL space reduced to 27% of the original. I expect that to climb to 35% to 40% when we do all counties, but that's just a guess. I've seen some clues that it will get a bit better in 8.3 because of HOT updates. (We force WAL files to be written hourly, by the way.) For us, this reduces overall CPU time used in archiving, reduces disk space needed for backups, reduces network traffic (including over a relatively slow WAN). The one downside I've found is that it adds 0.2 seconds of CPU time per WAL file archive during our heaviest update periods. It's in the archiver process, not a backend process that's running a query, and we're not generally CPU bound, so this is not a problem for us. > The logic for zeroing the blocks makes me nervous. It doesn't locate the > block from which to start, it treats all blocks equally, so might zero > some blocks and not others. What you have should work, but I'd be > inclined to put a test in there to check that doesn't happen: once we > begin to zero pages, all of them should be zeroed to end of file. If we > find one that shouldn't be zeroed, throw an error. Agreed. That is one of the reasons I referred to this as a first, rough version. I wanted to prove the technique in general before that refinement. Another reason is that it is rather light on error checking in general. While I was loath limit it to an exact match on the magic number, since it works unmodified on multiple versions, it seems dangerous not to enforce any limits there. I wasn't sure how best to approach that. Suggestions? I think I should also error if stdin has more data when I think I'm done. Agreed? I omitted the code I was originally considering to have it work against files "in place" rather than as a filter. It seemed much simpler this way, we didn't actually have a use case for the additional functionality, and it seemed safer as a filter. Thoughts? > We should also document that this is designed to help compress files > that aren't full because we switched early because of archive_timeout. Sure. Again, this is more at a "proof of concept" stage. It's enough to get us out of a tight spot on drive space, even as it stands, but I know that it needs polishing and documentation if it is to be accepted by the community. I just wasn't sure the interest was actually there. I'm still not sure whether this might be considered for inclusion in the base release or contrib, or whether I should open a pgfoundry project. Thanks for the feedback. -Kevin
---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match