Hello Mikhail,

I'm sorry. Please try attached patch instead.

Thank you for having a look!

Regards
Arne

________________________________
From: Mikhail Gribkov <youzh...@gmail.com>
Sent: Thursday, July 6, 2023 13:31
To: Arne Roland <a.rol...@index.de>
Cc: Pg Hackers <pgsql-hackers@lists.postgresql.org>
Subject: Re: Permute underscore separated components of columns before fuzzy 
matching

Hello Arne,

The goal of supporting words-switching hints sounds interesting and I've tried 
to apply your patch.
The patch was applied smoothly to the latest master and check-world reported no 
problems. Although I had problems after trying to test the new functionality.

I tried to simply mix words in pg_stat_activity.wait_event_type:

postgres=# select wait_type_event from pg_stat_activity ;
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in 
MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in 
MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in 
MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in 
MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in 
MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in 
MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in 
MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in 
MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in 
MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in 
MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in 
MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in 
MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in 
MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in 
MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in 
MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in 
MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in 
MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in 
MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in 
MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in 
MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in 
MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in 
MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] ERROR:  column "wait_type_event" does not 
exist at character 8
2023-07-06 14:12:35.968 MSK [1480] HINT:  Perhaps you meant to reference the 
column "pg_stat_activity.wait_event_type".
2023-07-06 14:12:35.968 MSK [1480] STATEMENT:  select wait_type_event from 
pg_stat_activity ;
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
ERROR:  column "wait_type_event" does not exist
LINE 1: select wait_type_event from pg_stat_activity ;
               ^
HINT:  Perhaps you meant to reference the column 
"pg_stat_activity.wait_event_type".
postgres=#

So the desired hint is really there, but thgether with looots of warnings. For 
sure these should not be be encountered.

And no, this is not some kind of side problem brought by some other commit. The 
same request on a plain master branch performs without these warnings:

postgres=# select wait_type_event from pg_stat_activity ;
2023-07-06 14:10:17.171 MSK [22431] ERROR:  column "wait_type_event" does not 
exist at character 8
2023-07-06 14:10:17.171 MSK [22431] STATEMENT:  select wait_type_event from 
pg_stat_activity ;
ERROR:  column "wait_type_event" does not exist
LINE 1: select wait_type_event from pg_stat_activity ;
--
 best regards,
    Mikhail A. Gribkov

e-mail: youzh...@gmail.com<mailto:youzh...@gmail.com>
http://www.flickr.com/photos/youzhick/albums
http://www.strava.com/athletes/5085772
phone: +7(916)604-71-12
Telegram: @youzhick

From 72a5ec73d7326b2b1be3acf74f76a8b307d7a7dc Mon Sep 17 00:00:00 2001
From: Arne Roland <a.rol...@index.de>
Date: Mon, 10 Jul 2023 20:07:44 +0200
Subject: [PATCH] fuzzy_underscore_permutation

---
 src/backend/parser/parse_relation.c | 103 +++++++++++++++++++++-------
 1 file changed, 80 insertions(+), 23 deletions(-)

diff --git a/src/backend/parser/parse_relation.c b/src/backend/parser/parse_relation.c
index 41d60494b9..63bfd5a825 100644
--- a/src/backend/parser/parse_relation.c
+++ b/src/backend/parser/parse_relation.c
@@ -579,32 +579,13 @@ GetCTEForRTE(ParseState *pstate, RangeTblEntry *rte, int rtelevelsup)
 	return NULL;				/* keep compiler quiet */
 }
 
-/*
- * updateFuzzyAttrMatchState
- *	  Using Levenshtein distance, consider if column is best fuzzy match.
- */
 static void
-updateFuzzyAttrMatchState(int fuzzy_rte_penalty,
-						  FuzzyAttrMatchState *fuzzystate, RangeTblEntry *rte,
-						  const char *actual, const char *match, int attnum)
+updateFuzzyAttrMatchStateSingleString(int fuzzy_rte_penalty,
+							FuzzyAttrMatchState *fuzzystate, RangeTblEntry *rte,
+							const char *actual, const char *match, int attnum, int matchlen)
 {
-	int			columndistance;
-	int			matchlen;
-
-	/* Bail before computing the Levenshtein distance if there's no hope. */
-	if (fuzzy_rte_penalty > fuzzystate->distance)
-		return;
-
-	/*
-	 * Outright reject dropped columns, which can appear here with apparent
-	 * empty actual names, per remarks within scanRTEForColumn().
-	 */
-	if (actual[0] == '\0')
-		return;
-
 	/* Use Levenshtein to compute match distance. */
-	matchlen = strlen(match);
-	columndistance =
+	int columndistance =
 		varstr_levenshtein_less_equal(actual, strlen(actual), match, matchlen,
 									  1, 1, 1,
 									  fuzzystate->distance + 1
@@ -667,6 +648,82 @@ updateFuzzyAttrMatchState(int fuzzy_rte_penalty,
 	}
 }
 
+/*
+ * updateFuzzyAttrMatchState
+ *	  Using Levenshtein distance, consider if column is best fuzzy match.
+ */
+static void
+updateFuzzyAttrMatchState(int fuzzy_rte_penalty,
+						FuzzyAttrMatchState *fuzzystate, RangeTblEntry *rte,
+						const char *actual, const char *match, int attnum)
+{
+	/* Memory segment to store the current permutation of the match string. */
+	char* tmp_match;
+	/*
+	 * We count the number of underscores here, because we want to know whether we should consider
+	 * permuting underscore separated sections.
+	 */
+	int underscore_count = 0;
+	int matchlen		 = strlen(match);
+	/* We check for the amounts of underscores first, since updateFuzzyAttrMatchState has already quadratic run time. */
+	for (int i = 0; i < matchlen; i++) {
+		if (match[i] == '_')
+			underscore_count++;
+	}
+
+	/* Bail before computing the Levenshtein distance if there's no hope. */
+	if (fuzzy_rte_penalty > fuzzystate->distance)
+		return;
+
+	/*
+	 * Outright reject dropped columns, which can appear here with apparent
+	 * empty actual names, per remarks within scanRTEForColumn().
+	 */
+	if (actual[0] == '\0')
+		return;
+
+	updateFuzzyAttrMatchStateSingleString(fuzzy_rte_penalty, fuzzystate, rte, actual, match, attnum, matchlen);
+	/*
+	 * If told to, check for permuting up to three sections separated by underscores.
+	 */
+	if (underscore_count && underscore_count <= 6) {
+			tmp_match = palloc(matchlen + 1);
+			tmp_match[matchlen] = '\0';
+			for (int i = 1; i < matchlen - 1; i++) {
+				if (match[i] == '_') {
+					/* Consider swapping two sections. */
+					memcpy(tmp_match, &match[i + 1], matchlen - i - 1);
+					tmp_match[matchlen - i - 1] = '_';
+					memcpy(&tmp_match[matchlen - i + 1], match, i);
+					updateFuzzyAttrMatchStateSingleString(fuzzy_rte_penalty + 1, fuzzystate, rte, actual, tmp_match, attnum, matchlen);
+					/* Consider swapping three sections. */
+					for (int j = i + 2; j < matchlen - 1; j++) {
+						if (match[j] == '_') {
+							/*
+							 * Only consider mirroring permutations, since the three simple rotations are already
+							 * (or will be for a later i) covered above.
+							 */
+							int permutation_matrix[3][3] = {{j - i, 0, j + 1},
+									{matchlen - i, matchlen - j, 0},
+									{0, matchlen - j + i + 1, i + 1}};
+							for (int k = 0; k < 3; k++) {
+								memcpy(&tmp_match[permutation_matrix[k][0]], match, i);
+								tmp_match[permutation_matrix[k][0] + i - 1 + 1] = '_';
+								memcpy(&tmp_match[permutation_matrix[k][1]], &match[i + 1], j - i - 1);
+								tmp_match[permutation_matrix[k][1] + j - i - 1] = '_';
+								memcpy(&tmp_match[permutation_matrix[k][2]], &match[j + 1], matchlen - j - 1);
+								tmp_match[permutation_matrix[k][2] + matchlen - j - 1] = '_';
+								tmp_match[matchlen] = '\0';
+								updateFuzzyAttrMatchStateSingleString(fuzzy_rte_penalty + 1, fuzzystate, rte, actual, tmp_match, attnum, matchlen);
+							}
+						}
+					}
+				}
+			}
+			pfree(tmp_match);
+		}
+}
+
 /*
  * scanNSItemForColumn
  *	  Search the column names of a single namespace item for the given name.
-- 
2.35.3

Reply via email to