On Wed, May 24, 2017 at 6:28 AM, Mithun Cy <mithun...@enterprisedb.com> wrote: > On Tue, May 23, 2017 at 7:06 PM, Mithun Cy <mithun...@enterprisedb.com> wrote: >> Thanks, Andres, >> >> I have tried to fix all of your comments. > > There was a typo issue in previous patch 07 where instead of == I have > used !=. And, a mistake in comments. I have corrected same now.
+/* + * autoprewarm : + * + * What is it? + * =========== + * A bgworker which automatically records information about blocks which were + * present in buffer pool before server shutdown and then prewarm the buffer + * pool upon server restart with those blocks. + * + * How does it work? + * ================= + * When the shared library "pg_prewarm" is preloaded, a + * bgworker "autoprewarm" is launched immediately after the server has reached + * consistent state. The bgworker will start loading blocks recorded in the + * format BlockInfoRecord + * <<DatabaseId,TableSpaceId,RelationId,Forknum,BlockNum>> in + * $PGDATA/AUTOPREWARM_FILE, until there is no free buffer left in the buffer + * pool. This way we do not replace any new blocks which were loaded either by + * the recovery process or the querying clients. + * + * Once the "autoprewarm" bgworker has completed its prewarm task, it will + * start a new task to periodically dump the BlockInfoRecords related to blocks + * which are currently in shared buffer pool. Upon next server restart, the + * bgworker will prewarm the buffer pool by loading those blocks. The GUC + * pg_prewarm.dump_interval will control the dumping activity of the bgworker. + */ Make this part of the file header comment. Also, add an enabling GUC. The default can be true, but it should be possible to preload the library so that the SQL functions are available without a dynamic library load without requiring you to get the auto-prewarm behavior. I suggest pg_prewarm.autoprewarm = true / false. Your SigtermHandler and SighupHandler routines are still capitalized in a way that's not very consistent with what we do elsewhere. I think all of our other signal handlers have names_like_this() not NamesLikeThis(). + * ============== types and variables used by autoprewam ============= Spelling. + * Meta-data of each persistent block which is dumped and used to load. Metadata +typedef struct BlockInfoRecord +{ + Oid database; /* database */ + Oid spcNode; /* tablespace */ + Oid filenode; /* relation's filenode. */ + ForkNumber forknum; /* fork number */ + BlockNumber blocknum; /* block number */ +} BlockInfoRecord; spcNode is capitalized differently from all of the other members. + LWLock *lock; /* protects SharedState */ Just declare this as LWLock lock, and initialize it using LWLockInitialize. The way you're doing it is more complicated. +static dsa_area *AutoPrewarmDSA = NULL; DSA seems like more than you need here. There's only one allocation being performed. I think it would be simpler and less error-prone to use DSM directly. I don't even think you need a shm_toc. You could just do: seg = dsm_create(segsize, 0); blkinfo = dsm_segment_address(seg); Then pass dsm_segment_handle(seg) to the worker using bgw_main_arg or bgw_extra, and have it call dsm_attach. An advantage of this approach is that you only allocate the memory you actually need, whereas DSA will allocate more, expecting that you might do further allocations. + pg_qsort(block_info_array, num_blocks, sizeof(BlockInfoRecord), + blockinfo_cmp); I think it would make more sense to sort the array on the load side, in the autoprewarm worker, rather than when dumping. First, many dumps will never be reloaded, so there's no real reason to waste time sorting them. Second, the way you've got the code right now, it relies heavily on the assumption that the dump file will be sorted, but that doesn't seem like a necessary assumption. We don't really expect users to hand-edit the dump files, but if they do, it'd be nice if it didn't randomly break things unnecessarily. + errmsg("autoprewarm: could not open \"%s\": %m", + dump_file_path))); Use standard error message wordings! Don't create new translatable messages by adding "autoprewarm" to error messages. There are multiple instances of this throughout the patch. + snprintf(dump_file_path, sizeof(dump_file_path), + "%s", AUTOPREWARM_FILE); This is completely pointless. You can get rid of the dump_file_path variable and just use AUTOPREWARM_FILE wherever you would have used dump_file_path. It's just making a copy of a string you already have. Also, this code interrupts the flow of the surrounding logic in a weird way; even if we needed it, it would make more sense to put it up higher, where we construct the temporary path, or down lower, where the value is actually needed. + SPI_connect(); + PushActiveSnapshot(GetTransactionSnapshot()); It is really unclear why you need this, since the code does not use SPI for anything, or do anything that needs a snapshot. + goto end_load; I think you should try to rewrite this function so that it doesn't need "goto". I also think in general that this function could be written in a much more direct way by getting rid of the switch and the BLKTYPE_* constants altogether. connect_to_db() can only happen once, so do that the beginning. After that, the logic can look roughly like this: BlockInfoRecord *old_blk = NULL; while (!got_sigterm && pos < maxpos && have_free_buffer()) { BlockInfoRecord *blk = block_info[pos]; /* Quit if we've reached records for another database. */ if (old_blk != NULL && old_blk->dbOid != blk->dbOid) break; /* * When we reach a new relation, close the old one. Note, however, * that the previous try_relation_open may have failed, in which case * rel will be NULL. */ if (old_blk != NULL && old_blk->relOid != blk->relOid && rel != NULL) { relation_close(rel, AccessShareLock); rel = NULL; } /* * Try to open each new relation, but only once, when we first * encounter it. If it's been dropped, skip the associated blocks. */ if (old_blk == NULL || old_blk->relOid != blk->relOid) { Assert(rel == NULL); rel = try_relation_open(blk->relOid, AccessShareLock); } if (!rel) { ++pos; continue; } /* Once per fork, check for fork existence and size. */ if (old_blk == NULL || old_blk->forkNumber != blk->forkNumber) { RelationOpenSmgr(rel); if (smgrexists(rel->rd_smgr, blk->forkNumber)) nblocks = RelationGetNumberOfBlocksInFork(...); else nblocks = 0; } /* Prewarm buffer. */ buf = ReadBufferExtended(...); ... ++pos; } + errhint("Kill all remaining database processes and restart" + " the database."))); Don't break strings across lines. Just put it all on one (long) line. I don't think you should really need default_database. It seems like it should be possible to jigger things so that those blocks are loaded together with some other database (say, let the first worker do it). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers