Re: [HACKERS] wait events for disk I/O
On Sat, Mar 18, 2017 at 10:52 PM, Rushabh Lathia wrote: > > > On Sat, Mar 18, 2017 at 5:15 PM, Robert Haas wrote: >> >> On Fri, Mar 17, 2017 at 10:01 AM, Rushabh Lathia >> wrote: >> > I tried to cover all the suggestion in the attached latest patch. >> >> Committed. I reworded the documentation entries, renamed a few of the >> wait events to make things more consistent, put all three lists in >> rigorous alphabetical order, and I fixed a couple of places where an >> error return from a system call could lead to returning without >> clearing the wait event. >> Thanks, I ran pgbench with shared_buffers set to 128kB, able to see below wait IO events. DataFileRead DataFileWrite WALInitSync WALInitWrite WALWrite DataFileSync WALRead Thanks & Regards, Rajkumar Raghuwanshi QMG, EnterpriseDB Corporation -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] wait events for disk I/O
On Sat, Mar 18, 2017 at 5:15 PM, Robert Haas wrote: > On Fri, Mar 17, 2017 at 10:01 AM, Rushabh Lathia > wrote: > > I tried to cover all the suggestion in the attached latest patch. > > Committed. I reworded the documentation entries, renamed a few of the > wait events to make things more consistent, put all three lists in > rigorous alphabetical order, and I fixed a couple of places where an > error return from a system call could lead to returning without > clearing the wait event. > > Thanks Robert. > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > -- Rushabh Lathia
Re: [HACKERS] wait events for disk I/O
On Fri, Mar 17, 2017 at 10:01 AM, Rushabh Lathia wrote: > I tried to cover all the suggestion in the attached latest patch. Committed. I reworded the documentation entries, renamed a few of the wait events to make things more consistent, put all three lists in rigorous alphabetical order, and I fixed a couple of places where an error return from a system call could lead to returning without clearing the wait event. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] wait events for disk I/O
Thanks Robert for the review. On Thu, Mar 16, 2017 at 8:05 PM, Robert Haas wrote: > On Thu, Mar 16, 2017 at 8:28 AM, Rahila Syed > wrote: > > Thank you for the updated patch. > > > > I have applied and tested it on latest sources and the patch looks good > to > > me. > > The documentation puts the new wait events in a pretty random order. > I think they should be alphabetized, like we do with the IPC events. > Done. > I also suggest we change the naming scheme so that the kind of thing > being operated on is first and this is followed by the operation name. > This will let us keep related entries next to each other after > alphabetizing. So with that principle in mind: > > Yes above naming scheme is more clear then the one i choose. - instead of ReadDataBlock etc. I propose DataFileRead, DataFileWrite, > DataFileSync, DataFileExtend, DataFileFlush, DataFilePrefetch, > DataFileTruncate. using file instead of block avoids singular/plural > confusion. > - instead of RelationSync and RelationImmedSync I proposed > DataFileSync and DataFileImmediateSync; these are md.c operations like > the previous set, so why name it differently? > Yes, you are right, DataFileSync and DataFileImmediateSync make more sense. > - instead of WriteRewriteDataBlock and SyncRewriteDataBlock and > TruncateLogicalMappingRewrite, which aren't consistent with each other > even though they are related, I propose LogicalRewriteWrite, > LogicalRewriteSync, and LogicalRewriteTruncate, which are also closer > to the names of the functions that contain those wait points > - for ReadBuffile and WriteBuffile seem OK, I propose BufFileRead and > BufFileWrite, again reversing the order and also tweaking the > capitalization > - in keeping with our new policy of referring to xlog as wal in user > visible interfaces, I propose WALRead, WALCopyRead, WALWrite, > WALInitWrite, WALCopyWrite, WALBootstrapWrite, WALInitSync, > WALBootstrapSync, WALSyncMethodAssign > - for control file ops, ControlFileRead, ControlFileWrite, > ControlFileWriteUpdate, ControlFileSync, ControlFileSyncUpdate > - ReadApplyLogicalMapping and friends seem to have to do with the > reorderbuffer code, so maybe ReorderBufferRead etc. > - there seems to be some discrepancy between the documentation and > pgstat_get_wait_io for the snapbuild stuff. maybe SnapBuildWrite, > SnapBuildSync, SnapBuildRead. > - SLRURead, SLRUWrite, etc. > - TimelineHistoryRead, etc. > - the syslogger changes should be dropped, since the syslogger is not > and should not be connected to shared memory > Ok removed. > - the replslot terminology seems like a case of odd capitalization and > overeager abbreviation. why not ReplicationSlotRead, > ReplicationSlotWrite, etc? similarly RelationMapRead, > RelationMapWrite, etc? > - CopyFileRead, CopyFileWrite > - LockFileCreateRead, etc. > - AddToDataDirLockFileRead is a little long and incomprehensible; > maybe LockFileUpdateRead etc. > How about LockFileAddToDataDirRead? even though its little long but it gives clear understanding about what's going on. > - DSMWriteZeroBytes, maybe? > > DSMFillZeroWrite? Basically want to keep the file IP operation at the end of the event name. > Of course the constants should be renamed to match. > I tried to cover all the suggestion in the attached latest patch. Thanks, Rushabh Lathia www.EnterpriseDB.com diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index 9eaf43a..acf310e 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -716,6 +716,12 @@ postgres 27093 0.0 0.0 30096 2752 ?Ss 11:34 0:00 postgres: ser point. + + + IO: The server process is waiting for a IO to complete. + wait_event will identify the specific wait point. + + @@ -1272,6 +1278,271 @@ postgres 27093 0.0 0.0 30096 2752 ?Ss 11:34 0:00 postgres: ser RecoveryApplyDelay Waiting to apply WAL at recovery because it is delayed. + + IO + BufFileRead + Waiting during buffer read operation. + + + BufFileWrite + Waiting during buffer write operation. + + + CheckpointLogicalRewriteSync + Waiting to sync logical mapping during a checkpoint for logical rewrite mappings. + + + ControlFileRead + Waiting to read the control file. + + + ControlFileSync + Waiting to sync the control file. + + + ControlFileSyncUpdate + Waiting to sync the control file during update control file. + + + ControlFileWrite + Waiting to write the control file. + + + ControlFileWriteUpdate + Waiting to write the control file during update control file. + + + CopyFileRead +
Re: [HACKERS] wait events for disk I/O
On Thu, Mar 16, 2017 at 8:28 AM, Rahila Syed wrote: > Thank you for the updated patch. > > I have applied and tested it on latest sources and the patch looks good to > me. The documentation puts the new wait events in a pretty random order. I think they should be alphabetized, like we do with the IPC events. I also suggest we change the naming scheme so that the kind of thing being operated on is first and this is followed by the operation name. This will let us keep related entries next to each other after alphabetizing. So with that principle in mind: - instead of ReadDataBlock etc. I propose DataFileRead, DataFileWrite, DataFileSync, DataFileExtend, DataFileFlush, DataFilePrefetch, DataFileTruncate. using file instead of block avoids singular/plural confusion. - instead of RelationSync and RelationImmedSync I proposed DataFileSync and DataFileImmediateSync; these are md.c operations like the previous set, so why name it differently? - instead of WriteRewriteDataBlock and SyncRewriteDataBlock and TruncateLogicalMappingRewrite, which aren't consistent with each other even though they are related, I propose LogicalRewriteWrite, LogicalRewriteSync, and LogicalRewriteTruncate, which are also closer to the names of the functions that contain those wait points - for ReadBuffile and WriteBuffile seem OK, I propose BufFileRead and BufFileWrite, again reversing the order and also tweaking the capitalization - in keeping with our new policy of referring to xlog as wal in user visible interfaces, I propose WALRead, WALCopyRead, WALWrite, WALInitWrite, WALCopyWrite, WALBootstrapWrite, WALInitSync, WALBootstrapSync, WALSyncMethodAssign - for control file ops, ControlFileRead, ControlFileWrite, ControlFileWriteUpdate, ControlFileSync, ControlFileSyncUpdate - ReadApplyLogicalMapping and friends seem to have to do with the reorderbuffer code, so maybe ReorderBufferRead etc. - there seems to be some discrepancy between the documentation and pgstat_get_wait_io for the snapbuild stuff. maybe SnapBuildWrite, SnapBuildSync, SnapBuildRead. - SLRURead, SLRUWrite, etc. - TimelineHistoryRead, etc. - the syslogger changes should be dropped, since the syslogger is not and should not be connected to shared memory - the replslot terminology seems like a case of odd capitalization and overeager abbreviation. why not ReplicationSlotRead, ReplicationSlotWrite, etc? similarly RelationMapRead, RelationMapWrite, etc? - CopyFileRead, CopyFileWrite - LockFileCreateRead, etc. - AddToDataDirLockFileRead is a little long and incomprehensible; maybe LockFileUpdateRead etc. - DSMWriteZeroBytes, maybe? Of course the constants should be renamed to match. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] wait events for disk I/O
Thank you for the updated patch. I have applied and tested it on latest sources and the patch looks good to me. >I am not quite sure about this, as this is for stat statements. Also part from the >place you found there are many other fwrite() call into pg_stat_statements, and >I intentionally haven't added event here as it is very small write about stat, and >doesn't look like we should add for those call. I agree that this writes less amount of data. Although tracking this can be useful too in scenarios where pg_stat_statements is lagging due to I/O bottleneck. I will leave this decision to the committer. Thank you, Rahila Syed On Wed, Mar 15, 2017 at 1:03 PM, Rushabh Lathia wrote: > Thanks Rahila for reviewing this patch. > > On Tue, Mar 14, 2017 at 8:13 PM, Rahila Syed > wrote: > >> Hello, >> >> I applied and tested this patch on latest sources and it works fine. >> >> Following are some comments, >> >> >+ /* Wait event for SNRU */ >> >+ WAIT_EVENT_READ_SLRU_PAGE, >> Typo in the comment. >> >> > Fixed. > > >> >FileWriteback(v->mdfd_vfd, seekpos, (off_t) BLCKSZ * nflush, >> WAIT_EVENT_FLUSH_DATA_BLOCK); >> This call is inside mdwriteback() which can flush more than one block so >> should WAIT_EVENT _FLUSH_DATA_BLOCK >> be renamed to WAIT_EVENT_FLUSH_DATA_BLOCKS? >> >> > Changed with WAIT_EVENT_FLUSH_DATA_BLOCKS. > > >> Should calls to write() in following functions be tracked too? >> qtext_store() - This is related to pg_stat_statements >> >> > > I am not quite sure about this, as this is for stat statements. Also part > from the > place you found there are many other fwrite() call into > pg_stat_statements, and > I intentionally haven't added event here as it is very small write about > stat, and > doesn't look like we should add for those call. > > > >> dsm_impl_mmap() - This is in relation to creating dsm segments. >> >> > Added new event here. Actually particular write call is zero-filling the > DSM file. > > >> write_auto_conf_file()- This is called when updated configuration >> parameters are >> written to a temp file. >> >> > write_auto_conf_file() is getting called during the ALTER SYSTEM call. > Here write > happen only when someone explicitly run the ALTER SYSTEM call. This is > administrator call and so doesn't seem like necessary to add separate wait > event > for this. > > PFA latest patch with other fixes. > > >> >> On Wed, Mar 8, 2017 at 4:50 PM, Rushabh Lathia >> wrote: >> >>> >>> >>> On Wed, Mar 8, 2017 at 8:23 AM, Robert Haas >>> wrote: >>> On Tue, Mar 7, 2017 at 9:32 PM, Amit Kapila wrote: > On Tue, Mar 7, 2017 at 9:16 PM, Robert Haas wrote: >> On Mon, Mar 6, 2017 at 9:09 PM, Amit Kapila wrote: >>> Sure, if you think both Writes and Reads at OS level can have some >>> chance of blocking in obscure cases, then we should add a wait event >>> for them. >> >> I think writes have a chance of blocking in cases even in cases that >> are not very obscure at all. > > Point taken for writes, but I think in general we should have some > criteria based on which we can decide whether to have a wait event for > a particular call. It should not happen that we have tons of wait > events and out of which, only a few are helpful in most of the cases > in real-world scenarios. Well, the problem is that if you pick and choose which wait events to add based on what you think will be common, you're actually kind of hosing yourself. Because now when something uncommon happens, suddenly you don't get any wait event data and you can't tell what's happening. I think the number of new wait events added by Rushabh's patch is wholly reasonable. Yeah, some of those are going to be a lot more common than others, but so what? We add wait events so that we can find out what's going on. I don't want to sometimes know when a backend is blocked on an I/O. I want to ALWAYS know. >>> Yes, I agree with Robert. Knowing what we want and what we don't >>> want is difficult to judge. Something which we might think its not useful >>> information, and later of end up into situation where we re-think about >>> adding those missing stuff is not good. Having more information about >>> the system, specially for monitoring purpose is always good. >>> >>> I am attaching another version of the patch, as I found stupid mistake >>> in the earlier version of patch, where I missed to initialize initial >>> value to >>> WaitEventIO enum. Also earlier version was not getting cleanly apply on >>> the current version of sources. >>> >>> >>> >>> -- >>> Rushabh Lathia >>> EnterpriseDB: http://www.enterprisedb.com >>> The Enterprise PostgreSQL Company >>> >>> >>> >>> -- >>> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) >>> To make changes to your subscription: >>> http://www.postgresql.org/mailpref/pgsql-hackers >>> >>
Re: [HACKERS] wait events for disk I/O
Thanks Rahila for reviewing this patch. On Tue, Mar 14, 2017 at 8:13 PM, Rahila Syed wrote: > Hello, > > I applied and tested this patch on latest sources and it works fine. > > Following are some comments, > > >+ /* Wait event for SNRU */ > >+ WAIT_EVENT_READ_SLRU_PAGE, > Typo in the comment. > > Fixed. > >FileWriteback(v->mdfd_vfd, seekpos, (off_t) BLCKSZ * nflush, > WAIT_EVENT_FLUSH_DATA_BLOCK); > This call is inside mdwriteback() which can flush more than one block so > should WAIT_EVENT _FLUSH_DATA_BLOCK > be renamed to WAIT_EVENT_FLUSH_DATA_BLOCKS? > > Changed with WAIT_EVENT_FLUSH_DATA_BLOCKS. > Should calls to write() in following functions be tracked too? > qtext_store() - This is related to pg_stat_statements > > I am not quite sure about this, as this is for stat statements. Also part from the place you found there are many other fwrite() call into pg_stat_statements, and I intentionally haven't added event here as it is very small write about stat, and doesn't look like we should add for those call. > dsm_impl_mmap() - This is in relation to creating dsm segments. > > Added new event here. Actually particular write call is zero-filling the DSM file. > write_auto_conf_file()- This is called when updated configuration > parameters are > written to a temp file. > > write_auto_conf_file() is getting called during the ALTER SYSTEM call. Here write happen only when someone explicitly run the ALTER SYSTEM call. This is administrator call and so doesn't seem like necessary to add separate wait event for this. PFA latest patch with other fixes. > > On Wed, Mar 8, 2017 at 4:50 PM, Rushabh Lathia > wrote: > >> >> >> On Wed, Mar 8, 2017 at 8:23 AM, Robert Haas >> wrote: >> >>> On Tue, Mar 7, 2017 at 9:32 PM, Amit Kapila >>> wrote: >>> > On Tue, Mar 7, 2017 at 9:16 PM, Robert Haas >>> wrote: >>> >> On Mon, Mar 6, 2017 at 9:09 PM, Amit Kapila >>> wrote: >>> >>> Sure, if you think both Writes and Reads at OS level can have some >>> >>> chance of blocking in obscure cases, then we should add a wait event >>> >>> for them. >>> >> >>> >> I think writes have a chance of blocking in cases even in cases that >>> >> are not very obscure at all. >>> > >>> > Point taken for writes, but I think in general we should have some >>> > criteria based on which we can decide whether to have a wait event for >>> > a particular call. It should not happen that we have tons of wait >>> > events and out of which, only a few are helpful in most of the cases >>> > in real-world scenarios. >>> >>> Well, the problem is that if you pick and choose which wait events to >>> add based on what you think will be common, you're actually kind of >>> hosing yourself. Because now when something uncommon happens, suddenly >>> you don't get any wait event data and you can't tell what's happening. >>> I think the number of new wait events added by Rushabh's patch is >>> wholly reasonable. Yeah, some of those are going to be a lot more >>> common than others, but so what? We add wait events so that we can >>> find out what's going on. I don't want to sometimes know when a >>> backend is blocked on an I/O. I want to ALWAYS know. >>> >>> >> Yes, I agree with Robert. Knowing what we want and what we don't >> want is difficult to judge. Something which we might think its not useful >> information, and later of end up into situation where we re-think about >> adding those missing stuff is not good. Having more information about >> the system, specially for monitoring purpose is always good. >> >> I am attaching another version of the patch, as I found stupid mistake >> in the earlier version of patch, where I missed to initialize initial >> value to >> WaitEventIO enum. Also earlier version was not getting cleanly apply on >> the current version of sources. >> >> >> >> -- >> Rushabh Lathia >> EnterpriseDB: http://www.enterprisedb.com >> The Enterprise PostgreSQL Company >> >> >> >> -- >> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-hackers >> >> > Regards, Rushabh Lathia www.EnterpriseDB.com The Enterprise PostgreSQL Company diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index 4d03531..45fedec 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -716,6 +716,12 @@ postgres 27093 0.0 0.0 30096 2752 ?Ss 11:34 0:00 postgres: ser point. + + + IO: The server process is waiting for a IO to complete. + wait_event will identify the specific wait point. + + @@ -1272,6 +1278,276 @@ postgres 27093 0.0 0.0 30096 2752 ?Ss 11:34 0:00 postgres: ser RecoveryApplyDelay Waiting to apply WAL at recovery because it is delayed. + + IO + ReadDataBlock +
Re: [HACKERS] wait events for disk I/O
Hello, I applied and tested this patch on latest sources and it works fine. Following are some comments, >+ /* Wait event for SNRU */ >+ WAIT_EVENT_READ_SLRU_PAGE, Typo in the comment. >FileWriteback(v->mdfd_vfd, seekpos, (off_t) BLCKSZ * nflush, WAIT_EVENT_FLUSH_DATA_BLOCK); This call is inside mdwriteback() which can flush more than one block so should WAIT_EVENT _FLUSH_DATA_BLOCK be renamed to WAIT_EVENT_FLUSH_DATA_BLOCKS? Should calls to write() in following functions be tracked too? qtext_store() - This is related to pg_stat_statements dsm_impl_mmap() - This is in relation to creating dsm segments. write_auto_conf_file()- This is called when updated configuration parameters are written to a temp file. Thank you, Rahila Syed On Wed, Mar 8, 2017 at 4:50 PM, Rushabh Lathia wrote: > > > On Wed, Mar 8, 2017 at 8:23 AM, Robert Haas wrote: > >> On Tue, Mar 7, 2017 at 9:32 PM, Amit Kapila >> wrote: >> > On Tue, Mar 7, 2017 at 9:16 PM, Robert Haas >> wrote: >> >> On Mon, Mar 6, 2017 at 9:09 PM, Amit Kapila >> wrote: >> >>> Sure, if you think both Writes and Reads at OS level can have some >> >>> chance of blocking in obscure cases, then we should add a wait event >> >>> for them. >> >> >> >> I think writes have a chance of blocking in cases even in cases that >> >> are not very obscure at all. >> > >> > Point taken for writes, but I think in general we should have some >> > criteria based on which we can decide whether to have a wait event for >> > a particular call. It should not happen that we have tons of wait >> > events and out of which, only a few are helpful in most of the cases >> > in real-world scenarios. >> >> Well, the problem is that if you pick and choose which wait events to >> add based on what you think will be common, you're actually kind of >> hosing yourself. Because now when something uncommon happens, suddenly >> you don't get any wait event data and you can't tell what's happening. >> I think the number of new wait events added by Rushabh's patch is >> wholly reasonable. Yeah, some of those are going to be a lot more >> common than others, but so what? We add wait events so that we can >> find out what's going on. I don't want to sometimes know when a >> backend is blocked on an I/O. I want to ALWAYS know. >> >> > Yes, I agree with Robert. Knowing what we want and what we don't > want is difficult to judge. Something which we might think its not useful > information, and later of end up into situation where we re-think about > adding those missing stuff is not good. Having more information about > the system, specially for monitoring purpose is always good. > > I am attaching another version of the patch, as I found stupid mistake > in the earlier version of patch, where I missed to initialize initial > value to > WaitEventIO enum. Also earlier version was not getting cleanly apply on > the current version of sources. > > > > -- > Rushabh Lathia > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > >
Re: [HACKERS] wait events for disk I/O
On Thu, Mar 9, 2017 at 10:54 AM, Rushabh Lathia wrote: > Thanks Rajkumar for performing tests on this patch. > > Yes, I also noticed similar results in my testing. Additionally sometime I > also > noticed ReadSLRUPage event on my system. > > I also run the reindex database command and I notices below IO events. > > SyncImmedRelation, > WriteDataBlock > WriteBuffile, > WriteXLog, > ReadDataBlock I have tried for generating some more events, by running pgbench on master/slave configuration, able to see three more events. WriteInitXLogFile SyncInitXLogFile ReadXLog -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] wait events for disk I/O
Thanks Rajkumar for performing tests on this patch. Yes, I also noticed similar results in my testing. Additionally sometime I also noticed ReadSLRUPage event on my system. I also run the reindex database command and I notices below IO events. SyncImmedRelation, WriteDataBlock WriteBuffile, WriteXLog, ReadDataBlock On Wed, Mar 8, 2017 at 6:41 PM, Rajkumar Raghuwanshi < rajkumar.raghuwan...@enterprisedb.com> wrote: > On Wed, Mar 8, 2017 at 4:50 PM, Rushabh Lathia > wrote: > >> I am attaching another version of the patch, as I found stupid mistake >> in the earlier version of patch, where I missed to initialize initial >> value to >> WaitEventIO enum. Also earlier version was not getting cleanly apply on >> the current version of sources. >> > > I have applied attached patch, set shared_buffers to 128kB and ran > pgbench, I am able to see below distinct IO events. > > --with ./pgbench -i -s 500 postgres > application_namewait_event_typewait_event query > pgbench IO ExtendDataBlock > copy pgbench_account > pgbench IO WriteXLog > copy pgbench_account > pgbench IO WriteDataBlock > copy pgbench_account > pgbench IO ReadDataBlock > vacuum analyze pgben > pgbench IO ReadBuffile >alter table pgbench_ > > --with ./pgbench -T 600 postgres (readwrite) > application_namewait_event_typewait_event query > pgbench IO ReadDataBlock > UPDATE pgbench_accou > pgbench IO WriteDataBlock > UPDATE pgbench_telle > IO SyncDataBlock > > pgbench IO SyncDataBlock > UPDATE pgbench_telle > IO SyncDataBlock > autovacuum: VACUUM A > pgbench IO WriteXLog > END; > pgbench IO ExtendDataBlock > copy pgbench_account > > --with ./pgbench -T 600 -S postgres (select only) > application_namewait_event_typewait_event query > pgbench IO ReadDataBlock > SELECT abalance FROM > > Attached excel with all IO event values. > > Thanks & Regards, > Rajkumar Raghuwanshi > QMG, EnterpriseDB Corporation > > > -- Rushabh Lathia www.EnterpriseDB.com
Re: [HACKERS] wait events for disk I/O
On Wed, Mar 8, 2017 at 4:50 PM, Rushabh Lathia wrote: > I am attaching another version of the patch, as I found stupid mistake > in the earlier version of patch, where I missed to initialize initial > value to > WaitEventIO enum. Also earlier version was not getting cleanly apply on > the current version of sources. > I have applied attached patch, set shared_buffers to 128kB and ran pgbench, I am able to see below distinct IO events. --with ./pgbench -i -s 500 postgres application_namewait_event_typewait_event query pgbench IO ExtendDataBlock copy pgbench_account pgbench IO WriteXLog copy pgbench_account pgbench IO WriteDataBlock copy pgbench_account pgbench IO ReadDataBlock vacuum analyze pgben pgbench IO ReadBuffile alter table pgbench_ --with ./pgbench -T 600 postgres (readwrite) application_namewait_event_typewait_event query pgbench IO ReadDataBlock UPDATE pgbench_accou pgbench IO WriteDataBlock UPDATE pgbench_telle IO SyncDataBlock pgbench IO SyncDataBlock UPDATE pgbench_telle IO SyncDataBlock autovacuum: VACUUM A pgbench IO WriteXLog END; pgbench IO ExtendDataBlock copy pgbench_account --with ./pgbench -T 600 -S postgres (select only) application_namewait_event_typewait_event query pgbench IO ReadDataBlock SELECT abalance FROM Attached excel with all IO event values. Thanks & Regards, Rajkumar Raghuwanshi QMG, EnterpriseDB Corporation wait_event_for_disk_io.ods Description: application/vnd.oasis.opendocument.spreadsheet -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] wait events for disk I/O
On Wed, Mar 8, 2017 at 8:23 AM, Robert Haas wrote: > On Tue, Mar 7, 2017 at 9:32 PM, Amit Kapila > wrote: > > On Tue, Mar 7, 2017 at 9:16 PM, Robert Haas > wrote: > >> On Mon, Mar 6, 2017 at 9:09 PM, Amit Kapila > wrote: > >>> Sure, if you think both Writes and Reads at OS level can have some > >>> chance of blocking in obscure cases, then we should add a wait event > >>> for them. > >> > >> I think writes have a chance of blocking in cases even in cases that > >> are not very obscure at all. > > > > Point taken for writes, but I think in general we should have some > > criteria based on which we can decide whether to have a wait event for > > a particular call. It should not happen that we have tons of wait > > events and out of which, only a few are helpful in most of the cases > > in real-world scenarios. > > Well, the problem is that if you pick and choose which wait events to > add based on what you think will be common, you're actually kind of > hosing yourself. Because now when something uncommon happens, suddenly > you don't get any wait event data and you can't tell what's happening. > I think the number of new wait events added by Rushabh's patch is > wholly reasonable. Yeah, some of those are going to be a lot more > common than others, but so what? We add wait events so that we can > find out what's going on. I don't want to sometimes know when a > backend is blocked on an I/O. I want to ALWAYS know. > > Yes, I agree with Robert. Knowing what we want and what we don't want is difficult to judge. Something which we might think its not useful information, and later of end up into situation where we re-think about adding those missing stuff is not good. Having more information about the system, specially for monitoring purpose is always good. I am attaching another version of the patch, as I found stupid mistake in the earlier version of patch, where I missed to initialize initial value to WaitEventIO enum. Also earlier version was not getting cleanly apply on the current version of sources. -- Rushabh Lathia EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company wait_event_for_disk_IO_v2.patch Description: application/download -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] wait events for disk I/O
On Tue, Mar 7, 2017 at 9:32 PM, Amit Kapila wrote: > On Tue, Mar 7, 2017 at 9:16 PM, Robert Haas wrote: >> On Mon, Mar 6, 2017 at 9:09 PM, Amit Kapila wrote: >>> Sure, if you think both Writes and Reads at OS level can have some >>> chance of blocking in obscure cases, then we should add a wait event >>> for them. >> >> I think writes have a chance of blocking in cases even in cases that >> are not very obscure at all. > > Point taken for writes, but I think in general we should have some > criteria based on which we can decide whether to have a wait event for > a particular call. It should not happen that we have tons of wait > events and out of which, only a few are helpful in most of the cases > in real-world scenarios. Well, the problem is that if you pick and choose which wait events to add based on what you think will be common, you're actually kind of hosing yourself. Because now when something uncommon happens, suddenly you don't get any wait event data and you can't tell what's happening. I think the number of new wait events added by Rushabh's patch is wholly reasonable. Yeah, some of those are going to be a lot more common than others, but so what? We add wait events so that we can find out what's going on. I don't want to sometimes know when a backend is blocked on an I/O. I want to ALWAYS know. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] wait events for disk I/O
On Tue, Mar 7, 2017 at 9:16 PM, Robert Haas wrote: > On Mon, Mar 6, 2017 at 9:09 PM, Amit Kapila wrote: >> Sure, if you think both Writes and Reads at OS level can have some >> chance of blocking in obscure cases, then we should add a wait event >> for them. > > I think writes have a chance of blocking in cases even in cases that > are not very obscure at all. > Point taken for writes, but I think in general we should have some criteria based on which we can decide whether to have a wait event for a particular call. It should not happen that we have tons of wait events and out of which, only a few are helpful in most of the cases in real-world scenarios. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] wait events for disk I/O
On Mon, Mar 6, 2017 at 9:09 PM, Amit Kapila wrote: > Sure, if you think both Writes and Reads at OS level can have some > chance of blocking in obscure cases, then we should add a wait event > for them. I think writes have a chance of blocking in cases even in cases that are not very obscure at all. >> I think somebody should try a pgbench run with this patch applied, >> using a scale factor greater than shared_buffers, and generate a wait >> event profile, just to see if these are showing up and how often. > > Yeah, that makes sense to me and we should try for both read-write and > read-only tests. wfm. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] wait events for disk I/O
On Tue, Mar 7, 2017 at 5:27 AM, Robert Haas wrote: > On Mon, Mar 6, 2017 at 3:27 AM, Rushabh Lathia > wrote: >> Yes, I thought of adding wait event only for the sync but then recording the >> wait event for both write and sync. I understand that OS level writes are >> cheap but it still have some cost attached to that. Also I thought for the >> monitoring tool being develop using this wait events, will have more useful >> capture data if we try to collect as much info as we can. Or may be not. >> >> I am open for other opinion/suggestions. > > Writes are NOT always fast. I've seen cases of write() blocking for > LONG periods of time on systems that are in the process of failing, or > just busy. So I think we certainly want to advertise a wait event for > those. > Sure, if you think both Writes and Reads at OS level can have some chance of blocking in obscure cases, then we should add a wait event for them. > Likewise, I agree that the prefetch call probably SHOULDN'T block, but > just because it shouldn't doesn't mean it never will. > > I think somebody should try a pgbench run with this patch applied, > using a scale factor greater than shared_buffers, and generate a wait > event profile, just to see if these are showing up and how often. > Yeah, that makes sense to me and we should try for both read-write and read-only tests. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] wait events for disk I/O
On Tue, Mar 7, 2017 at 8:57 AM, Robert Haas wrote: > On Mon, Mar 6, 2017 at 3:27 AM, Rushabh Lathia > wrote: >> Yes, I thought of adding wait event only for the sync but then recording the >> wait event for both write and sync. I understand that OS level writes are >> cheap but it still have some cost attached to that. Also I thought for the >> monitoring tool being develop using this wait events, will have more useful >> capture data if we try to collect as much info as we can. Or may be not. >> >> I am open for other opinion/suggestions. > > Writes are NOT always fast. I've seen cases of write() blocking for > LONG periods of time on systems that are in the process of failing, or > just busy. So I think we certainly want to advertise a wait event for > those. +1. I see that quite often really happens, and a lot particularly on overloaded VMs. Having to debug such systems until you notice that what is slow is just the underlying system is no fun. So being able to show up an event state in memory where we can say that things are stuck on I/O just with PG interface can really help in diagnostics without having to look at storage-level logs for a first impression. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] wait events for disk I/O
On Mon, Mar 6, 2017 at 3:27 AM, Rushabh Lathia wrote: > Yes, I thought of adding wait event only for the sync but then recording the > wait event for both write and sync. I understand that OS level writes are > cheap but it still have some cost attached to that. Also I thought for the > monitoring tool being develop using this wait events, will have more useful > capture data if we try to collect as much info as we can. Or may be not. > > I am open for other opinion/suggestions. Writes are NOT always fast. I've seen cases of write() blocking for LONG periods of time on systems that are in the process of failing, or just busy. So I think we certainly want to advertise a wait event for those. Likewise, I agree that the prefetch call probably SHOULDN'T block, but just because it shouldn't doesn't mean it never will. I think somebody should try a pgbench run with this patch applied, using a scale factor greater than shared_buffers, and generate a wait event profile, just to see if these are showing up and how often. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] wait events for disk I/O
On Sat, Mar 4, 2017 at 7:53 PM, Amit Kapila wrote: > On Mon, Feb 20, 2017 at 4:04 PM, Rushabh Lathia > wrote: > > > > My colleague Rahila reported compilation issue with > > the patch. Issue was only coming with we do the clean > > build on the branch. > > > > Fixed the same into latest version of patch. > > > > Few assorted comments: > > 1. > + > + WriteBuffile > + Waiting during > buffile read operation. > + > > Operation name and definition are not matching. > > Will fix this. > > 2. > +FilePrefetch(File file, off_t offset, int amount, uint32 wait_event_info) > { > #if defined(USE_POSIX_FADVISE) && defined(POSIX_FADV_WILLNEED) > int returnCode; > @@ -1527,8 +1527,10 @@ FilePrefetch(File file, off_t offset, int amount) > if (returnCode < 0) > return returnCode; > > + pgstat_report_wait_start(wait_event_info); > returnCode = posix_fadvise(VfdCache[file].fd, offset, amount, > POSIX_FADV_WILLNEED); > + pgstat_report_wait_end(); > > AFAIK, this call is non-blocking and will just initiate a read, so do > you think we should record wait event for such a call. > > Yes, prefecth call is a non-blocking and will just initiate a read. But this info about the prefetch into wait events will give more info about the system. 3. > - written = FileWrite(src->vfd, waldata_start, len); > + written = FileWrite(src->vfd, waldata_start, len, > + WAIT_EVENT_WRITE_REWRITE_DATA_BLOCK); > if (written != len) > ereport(ERROR, > (errcode_for_file_access(), > @@ -957,7 +960,7 @@ logical_end_heap_rewrite(RewriteState state) > hash_seq_init(&seq_status, state->rs_logical_mappings); > while ((src = (RewriteMappingFile *) hash_seq_search(&seq_status)) != > NULL) > { > - if (FileSync(src->vfd) != 0) > + if (FileSync(src->vfd, WAIT_EVENT_SYNC_REWRITE_DATA_BLOCK) != 0) > > > Do we want to consider recording wait event for both write and sync? > It seems to me OS level writes are relatively cheap and sync calls are > expensive, so shouldn't we just log for sync calls? I could see the > similar usage at multiple places in the patch. > > Yes, I thought of adding wait event only for the sync but then recording the wait event for both write and sync. I understand that OS level writes are cheap but it still have some cost attached to that. Also I thought for the monitoring tool being develop using this wait events, will have more useful capture data if we try to collect as much info as we can. Or may be not. I am open for other opinion/suggestions. -- > With Regards, > Amit Kapila. > EnterpriseDB: http://www.enterprisedb.com > -- Rushabh Lathia
Re: [HACKERS] wait events for disk I/O
On Mon, Feb 20, 2017 at 4:04 PM, Rushabh Lathia wrote: > > My colleague Rahila reported compilation issue with > the patch. Issue was only coming with we do the clean > build on the branch. > > Fixed the same into latest version of patch. > Few assorted comments: 1. + + WriteBuffile + Waiting during buffile read operation. + Operation name and definition are not matching. 2. +FilePrefetch(File file, off_t offset, int amount, uint32 wait_event_info) { #if defined(USE_POSIX_FADVISE) && defined(POSIX_FADV_WILLNEED) int returnCode; @@ -1527,8 +1527,10 @@ FilePrefetch(File file, off_t offset, int amount) if (returnCode < 0) return returnCode; + pgstat_report_wait_start(wait_event_info); returnCode = posix_fadvise(VfdCache[file].fd, offset, amount, POSIX_FADV_WILLNEED); + pgstat_report_wait_end(); AFAIK, this call is non-blocking and will just initiate a read, so do you think we should record wait event for such a call. 3. - written = FileWrite(src->vfd, waldata_start, len); + written = FileWrite(src->vfd, waldata_start, len, + WAIT_EVENT_WRITE_REWRITE_DATA_BLOCK); if (written != len) ereport(ERROR, (errcode_for_file_access(), @@ -957,7 +960,7 @@ logical_end_heap_rewrite(RewriteState state) hash_seq_init(&seq_status, state->rs_logical_mappings); while ((src = (RewriteMappingFile *) hash_seq_search(&seq_status)) != NULL) { - if (FileSync(src->vfd) != 0) + if (FileSync(src->vfd, WAIT_EVENT_SYNC_REWRITE_DATA_BLOCK) != 0) Do we want to consider recording wait event for both write and sync? It seems to me OS level writes are relatively cheap and sync calls are expensive, so shouldn't we just log for sync calls? I could see the similar usage at multiple places in the patch. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] wait events for disk I/O
My colleague Rahila reported compilation issue with the patch. Issue was only coming with we do the clean build on the branch. Fixed the same into latest version of patch. Thanks, On Tue, Jan 31, 2017 at 11:09 AM, Rushabh Lathia wrote: > > > On Tue, Jan 31, 2017 at 8:54 AM, Michael Paquier < > michael.paqu...@gmail.com> wrote: > >> On Mon, Jan 30, 2017 at 10:01 PM, Rushabh Lathia >> wrote: >> > Attached is the patch, which extend the existing wait event >> infrastructure >> > to implement the wait events for the disk I/O. Basically >> pg_stat_activity's >> > wait event information to show data about disk I/O as well as IPC >> primitives. >> > >> > Implementation details: >> > >> > - Added PG_WAIT_IO to pgstat.h and a new enum WaitEventIO >> > - Added a wait_event_info argument to FileRead, FileWrite, FilePrefetch, >> > FileWriteback, FileSync, and FileTruncate. Set this wait event just >> before >> > performing the file system operation and clear it just after. >> > - Pass down an appropriate wait event from caller of any of those >> > functions. >> > - Also set and clear a wait event around standalone calls to read(), >> > write(), fsync() in other parts of the system. >> > - Added documentation for all newly added wait event. >> >> Looks neat, those are unlikely to overlap with other wait events. >> > > Thanks. > > >> >> > Open issue: >> > - Might missed few standalone calls to read(), write(), etc which need >> > to pass the wait_event_info. >> >> It may be an idea to use LD_PRELOAD with custom versions of read(), >> write() and fsync(), and look at the paths where no flags are set in >> MyProc->wait_event_info, and log information when that happens. >> >> > Yes, may be I will try this. > > >> > Thanks to my colleague Robert Haas for his help in design. >> > Please let me know your thought, and thanks for reading. >> >> Did you consider a wrapper of the type pg_read_event() or >> pg_write_event(), etc.? >> > > I thought on that, but eventually stick to this approach as it looks > more neat and uniform with other wait event implementation. > > > >> -- >> Michael >> > > > > Thanks, > Rushabh Lathia > www.EnterpriseDB.com > -- Rushabh Lathia wait_event_for_disk_IO_v1.patch Description: application/download -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] wait events for disk I/O
On Tue, Jan 31, 2017 at 8:54 AM, Michael Paquier wrote: > On Mon, Jan 30, 2017 at 10:01 PM, Rushabh Lathia > wrote: > > Attached is the patch, which extend the existing wait event > infrastructure > > to implement the wait events for the disk I/O. Basically > pg_stat_activity's > > wait event information to show data about disk I/O as well as IPC > primitives. > > > > Implementation details: > > > > - Added PG_WAIT_IO to pgstat.h and a new enum WaitEventIO > > - Added a wait_event_info argument to FileRead, FileWrite, FilePrefetch, > > FileWriteback, FileSync, and FileTruncate. Set this wait event just > before > > performing the file system operation and clear it just after. > > - Pass down an appropriate wait event from caller of any of those > > functions. > > - Also set and clear a wait event around standalone calls to read(), > > write(), fsync() in other parts of the system. > > - Added documentation for all newly added wait event. > > Looks neat, those are unlikely to overlap with other wait events. > Thanks. > > > Open issue: > > - Might missed few standalone calls to read(), write(), etc which need > > to pass the wait_event_info. > > It may be an idea to use LD_PRELOAD with custom versions of read(), > write() and fsync(), and look at the paths where no flags are set in > MyProc->wait_event_info, and log information when that happens. > > Yes, may be I will try this. > > Thanks to my colleague Robert Haas for his help in design. > > Please let me know your thought, and thanks for reading. > > Did you consider a wrapper of the type pg_read_event() or > pg_write_event(), etc.? > I thought on that, but eventually stick to this approach as it looks more neat and uniform with other wait event implementation. > -- > Michael > Thanks, Rushabh Lathia www.EnterpriseDB.com
Re: [HACKERS] wait events for disk I/O
On Mon, Jan 30, 2017 at 10:01 PM, Rushabh Lathia wrote: > Attached is the patch, which extend the existing wait event infrastructure > to implement the wait events for the disk I/O. Basically pg_stat_activity's > wait event information to show data about disk I/O as well as IPC primitives. > > Implementation details: > > - Added PG_WAIT_IO to pgstat.h and a new enum WaitEventIO > - Added a wait_event_info argument to FileRead, FileWrite, FilePrefetch, > FileWriteback, FileSync, and FileTruncate. Set this wait event just before > performing the file system operation and clear it just after. > - Pass down an appropriate wait event from caller of any of those > functions. > - Also set and clear a wait event around standalone calls to read(), > write(), fsync() in other parts of the system. > - Added documentation for all newly added wait event. Looks neat, those are unlikely to overlap with other wait events. > Open issue: > - Might missed few standalone calls to read(), write(), etc which need > to pass the wait_event_info. It may be an idea to use LD_PRELOAD with custom versions of read(), write() and fsync(), and look at the paths where no flags are set in MyProc->wait_event_info, and log information when that happens. > Thanks to my colleague Robert Haas for his help in design. > Please let me know your thought, and thanks for reading. Did you consider a wrapper of the type pg_read_event() or pg_write_event(), etc.? -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] wait events for disk I/O
Hi All, Attached is the patch, which extend the existing wait event infrastructure to implement the wait events for the disk I/O. Basically pg_stat_activity's wait event information to show data about disk I/O as well as IPC primitives. Implementation details: - Added PG_WAIT_IO to pgstat.h and a new enum WaitEventIO - Added a wait_event_info argument to FileRead, FileWrite, FilePrefetch, FileWriteback, FileSync, and FileTruncate. Set this wait event just before performing the file system operation and clear it just after. - Pass down an appropriate wait event from caller of any of those functions. - Also set and clear a wait event around standalone calls to read(), write(), fsync() in other parts of the system. - Added documentation for all newly added wait event. Open issue: - Might missed few standalone calls to read(), write(), etc which need to pass the wait_event_info. Thanks to my colleague Robert Haas for his help in design. Please let me know your thought, and thanks for reading. Thanks, Rushabh Lathia www.EnterpriseDB.com wait_event_disk_IO.patch Description: binary/octet-stream -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers