Hello, I got bitten by an inconsistency introduced about two years ago. In the script generate-wait_event_types.pl, the intermediate line format is parsed using a regular expression that allows multiple tab characters between fields. However, the fields were later extracted using split(/\t/, ...), which assumes single-tab delimiters and fails when fields are separated by multiple tabs. This leads to a somewhat unclear error when processing input that should otherwise be valid (*1):
> substr outside of string at ./generate-wait_event_types.pl line 243, > <$wait_event_names> line 434. Since the data was already captured via regex, using $1, $2 and $3 instead of split() avoids the inconsistency and makes the intent clearer. A related adjustment was made elsewhere in the script to improve consistency. This is addressed in the attached patch. regards. *1: diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt index 0be307d2ca0..ba551938ed7 100644 --- a/src/backend/utils/activity/wait_event_names.txt +++ b/src/backend/utils/activity/wait_event_names.txt @@ -405,7 +405,7 @@ SerialSLRU "Waiting to access the serializable transaction conflict SLRU cache." SubtransSLRU "Waiting to access the sub-transaction SLRU cache." XactSLRU "Waiting to access the transaction status SLRU cache." ParallelVacuumDSA "Waiting for parallel vacuum dynamic shared memory allocation." -AioUringCompletion "Waiting for another process to complete IO via io_uring." +AioUringCompletion "Waiting for another process to complete IO via io_uring." # No "ABI_compatibility" region here as WaitEventLWLock has its own C code.
>From 4520e71a9f064b1c01df045d0de878a9cf15b1aa Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horikyota....@gmail.com> Date: Tue, 29 Jul 2025 11:45:21 +0900 Subject: [PATCH v1] Make tab delimiter handling consistent in generate-wait_event_types.pl Format validation and element extraction for intermediate line strings are inconsistent in their handling of multiple tab delimiters, which can result in an unclear error. Extract the elements using regex captures from the validation regex instead of a separate split() to avoid the inconsistency. Also replace \t with \t+ in the remaining split() calls on the same strings for consistency. --- src/backend/utils/activity/generate-wait_event_types.pl | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/backend/utils/activity/generate-wait_event_types.pl b/src/backend/utils/activity/generate-wait_event_types.pl index 424ad9f115d..21abef860de 100644 --- a/src/backend/utils/activity/generate-wait_event_types.pl +++ b/src/backend/utils/activity/generate-wait_event_types.pl @@ -85,7 +85,7 @@ while (<$wait_event_names>) # Sort the lines based on the second column. # uc() is being used to force the comparison to be case-insensitive. my @lines_sorted = - sort { uc((split(/\t/, $a))[1]) cmp uc((split(/\t/, $b))[1]) } @lines; + sort { uc((split(/\t+/, $a))[1]) cmp uc((split(/\t+/, $b))[1]) } @lines; # If we are generating code, concat @lines_sorted and then # @abi_compatibility_lines. @@ -101,7 +101,7 @@ foreach my $line (@lines_sorted) unless $line =~ /^(\w+)\t+(\w+)\t+("\w.*\.")$/; (my $waitclassname, my $waiteventname, my $waitevendocsentence) = - split(/\t/, $line); + ($1, $2, $3); # Generate the element name for the enums based on the # description. The C symbols are prefixed with "WAIT_EVENT_". -- 2.47.1