Hello,

I got bitten by an inconsistency introduced about two years ago. In
the script generate-wait_event_types.pl, the intermediate line format
is parsed using a regular expression that allows multiple tab
characters between fields. However, the fields were later extracted
using split(/\t/, ...), which assumes single-tab delimiters and fails
when fields are separated by multiple tabs. This leads to a somewhat
unclear error when processing input that should otherwise be valid
(*1):

> substr outside of string at ./generate-wait_event_types.pl line 243,
>  <$wait_event_names> line 434.

Since the data was already captured via regex, using $1, $2 and $3
instead of split() avoids the inconsistency and makes the intent
clearer. A related adjustment was made elsewhere in the script to
improve consistency.

This is addressed in the attached patch.

regards.


*1:
diff --git a/src/backend/utils/activity/wait_event_names.txt 
b/src/backend/utils/activity/wait_event_names.txt
index 0be307d2ca0..ba551938ed7 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -405,7 +405,7 @@ SerialSLRU  "Waiting to access the serializable transaction 
conflict SLRU cache."
 SubtransSLRU   "Waiting to access the sub-transaction SLRU cache."
 XactSLRU       "Waiting to access the transaction status SLRU cache."
 ParallelVacuumDSA      "Waiting for parallel vacuum dynamic shared memory 
allocation."
-AioUringCompletion     "Waiting for another process to complete IO via 
io_uring."
+AioUringCompletion             "Waiting for another process to complete IO via 
io_uring."
 
 # No "ABI_compatibility" region here as WaitEventLWLock has its own C code.
 
>From 4520e71a9f064b1c01df045d0de878a9cf15b1aa Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota....@gmail.com>
Date: Tue, 29 Jul 2025 11:45:21 +0900
Subject: [PATCH v1] Make tab delimiter handling consistent in
 generate-wait_event_types.pl

Format validation and element extraction for intermediate line strings
are inconsistent in their handling of multiple tab delimiters, which
can result in an unclear error. Extract the elements using regex
captures from the validation regex instead of a separate split() to
avoid the inconsistency. Also replace \t with \t+ in the remaining
split() calls on the same strings for consistency.
---
 src/backend/utils/activity/generate-wait_event_types.pl | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/backend/utils/activity/generate-wait_event_types.pl b/src/backend/utils/activity/generate-wait_event_types.pl
index 424ad9f115d..21abef860de 100644
--- a/src/backend/utils/activity/generate-wait_event_types.pl
+++ b/src/backend/utils/activity/generate-wait_event_types.pl
@@ -85,7 +85,7 @@ while (<$wait_event_names>)
 # Sort the lines based on the second column.
 # uc() is being used to force the comparison to be case-insensitive.
 my @lines_sorted =
-  sort { uc((split(/\t/, $a))[1]) cmp uc((split(/\t/, $b))[1]) } @lines;
+  sort { uc((split(/\t+/, $a))[1]) cmp uc((split(/\t+/, $b))[1]) } @lines;
 
 # If we are generating code, concat @lines_sorted and then
 # @abi_compatibility_lines.
@@ -101,7 +101,7 @@ foreach my $line (@lines_sorted)
 	  unless $line =~ /^(\w+)\t+(\w+)\t+("\w.*\.")$/;
 
 	(my $waitclassname, my $waiteventname, my $waitevendocsentence) =
-	  split(/\t/, $line);
+	  ($1, $2, $3);
 
 	# Generate the element name for the enums based on the
 	# description.  The C symbols are prefixed with "WAIT_EVENT_".
-- 
2.47.1

Reply via email to