Hello!

Inspired by Simon Riggs' keynote talk at PGCounf.eu 2023 sharing list
of ideas for PostgreSQL
(https://riggs.business/blog/f/postgresql-todo-2023), I have crafted a
quick patch to do SQL syntax validation.

It is also heavily inspired by the "ruby -c" command, useful to check
syntax of Ruby programs without executing them.

For now, to keep it simple and to open discussion, I have added new
"--syntax" option into "postgres" command, since it is currently the
only one using needed parser dependency (at least per my
understanding). I tried to add this into psql or separate pg_syntax
commands, but parser is not exposed in "postgres_fe.h" and including
backend into those tools would not make most likely sense. Also syntax
could vary per backend, it makes sense to keep it in there.

It expects input on STDIN, prints out error if any and prints out
summary message (Valid SQL/Invalid SQL). On valid input it exits with
0 (success), otherwise it exits with 1 (error).

Example usage:

$ echo "SELECT 1" | src/backend/postgres --syntax
Valid SQL

$ echo "SELECT 1abc" | src/backend/postgres --syntax
ERROR:  trailing junk after numeric literal at or near "1a" at character 8
Invalid SQL

$ cat ../src/test/regress/sql/alter_operator.sql | src/backend/postgres --syntax
Valid SQL

$ cat ../src/test/regress/sql/advisory_lock.sql | src/backend/postgres --syntax
ERROR:  syntax error at or near "\" at character 99
Invalid SQL

This could be useful for manual script checks, automated script checks
and code editor integrations.

Notice it just parses the SQL, it doesn't detect any "runtime"
problems like unexisting table names, etc.

I have various ideas around this (like exposing similar functionality
directly in SQL using custom function like pg_check_syntax), but I
would like to get some feedback first.

What do you think?
enhnace
PS: I wasn't able to find any automated tests for "postgres" command
to enhance with, are there any?

PS2: Patch could be found at https://github.com/simi/postgres/pull/8 as well.
From 6ec6cc599678c0ac76f4559039ff3399f843b9b1 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Josef=20=C5=A0im=C3=A1nek?= <josef.sima...@gmail.com>
Date: Fri, 15 Dec 2023 13:00:23 +0100
Subject: [PATCH] Add --syntax option to postgres.

- it validates SQL on STDIN and reports back if valid/invalid
---
 src/backend/main/main.c | 45 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 45 insertions(+)

diff --git a/src/backend/main/main.c b/src/backend/main/main.c
index ed11e8be7fab..eb7266f9d5d3 100644
--- a/src/backend/main/main.c
+++ b/src/backend/main/main.c
@@ -40,6 +40,7 @@
 #include "utils/memutils.h"
 #include "utils/pg_locale.h"
 #include "utils/ps_status.h"
+#include "parser/parser.h"
 
 
 const char *progname;
@@ -50,6 +51,7 @@ static void startup_hacks(const char *progname);
 static void init_locale(const char *categoryname, int category, const char *locale);
 static void help(const char *progname);
 static void check_root(const char *progname);
+static void check_syntax(void);
 
 
 /*
@@ -153,6 +155,10 @@ main(int argc, char *argv[])
 			fputs(PG_BACKEND_VERSIONSTR, stdout);
 			exit(0);
 		}
+		if (strcmp(argv[1], "--syntax") == 0)
+		{
+			check_syntax();
+		}
 
 		/*
 		 * In addition to the above, we allow "--describe-config" and "-C var"
@@ -347,6 +353,7 @@ help(const char *progname)
 	printf(_("  -s                 show statistics after each query\n"));
 	printf(_("  -S WORK-MEM        set amount of memory for sorts (in kB)\n"));
 	printf(_("  -V, --version      output version information, then exit\n"));
+	printf(_("  --syntax           checks SQL on STDIN is valid, then exit\n"));
 	printf(_("  --NAME=VALUE       set run-time parameter\n"));
 	printf(_("  --describe-config  describe configuration parameters, then exit\n"));
 	printf(_("  -?, --help         show this help, then exit\n"));
@@ -422,6 +429,44 @@ check_root(const char *progname)
 #endif							/* WIN32 */
 }
 
+static void
+check_syntax() {
+	List *parsetree_list;
+	bool valid = false;
+	char buffer[1024 * 1024];
+	char ch;
+	int i = 0;
+
+	// TODO: check for buffer overflow
+	while ((ch = getchar()) != EOF)
+	{
+		buffer[i] = ch;
+		i++;
+	}
+
+	PG_TRY();
+	{
+		parsetree_list = raw_parser(buffer, RAW_PARSE_DEFAULT);
+		valid = true;
+	}
+	PG_CATCH();
+	{
+		EmitErrorReport();
+		FlushErrorState();
+		valid = false;
+	}
+	PG_END_TRY();
+
+	// TODO: translate output
+	if (valid && parsetree_list) {
+		printf("Valid SQL\n");
+		exit(0);
+	} else {
+		printf("Invalid SQL\n");
+		exit(1);
+	}
+}
+
 /*
  * At least on linux, set_ps_display() breaks /proc/$pid/environ. The
  * sanitizer library uses /proc/$pid/environ to implement getenv() as it wants

Reply via email to