morozov commented on code in PR #3929:
URL: https://github.com/apache/flink-cdc/pull/3929#discussion_r1993805772
##########
flink-cdc-connect/flink-cdc-source-connectors/flink-connector-postgres-cdc/src/main/java/org/apache/flink/cdc/connectors/postgres/source/utils/PostgresQueryUtils.java:
##########
@@ -69,12 +69,17 @@ public static long queryApproximateRowCnt(JdbcConnection
jdbc, TableId tableId)
//
https://stackoverflow.com/questions/7943233/fast-way-to-discover-the-row-count-of-a-table-in-postgresql
// NOTE: it requires ANALYZE or VACUUM to be run first in PostgreSQL.
final String query =
- String.format(
- "SELECT reltuples::bigint FROM pg_class WHERE oid =
to_regclass('%s')",
- tableId.toString());
-
- return jdbc.queryAndMap(
+ "SELECT reltuples::bigint"
+ + " FROM pg_class c"
+ + " JOIN pg_namespace n ON n.oid = c.relnamespace"
Review Comment:
This is part of the bug fix. Let's assume we have a table named `Customers`
in the schema named `public`.
The previous logic would work like this:
1. Use `TableId#toString()` to build the qualified name from the schema name
and table name: `public.Customers`.
2. Use `to_regclass()` to find the table OID by its qualified name.
3. Fetch the number of rows in the table by OID.
The problem is that the value passed to `to_regclass()` is parsed as a
Postgres SQL expression but isn't formatted as such. It doesn't account for
case insensitivity of unquoted identifiers (see the
[documentation](https://www.postgresql.org/docs/current/sql-syntax-lexical.html)):
> Quoting an identifier also makes it case-sensitive, whereas unquoted names
are always folded to lower case. For example, the identifiers FOO, foo, and
"foo" are considered the same by PostgreSQL, but "Foo" and "FOO" are different
from these three and each other.
For the original query to work, it should have looked like:
```sql
SELECT reltuples::bigint FROM pg_class WHERE oid =
to_regclass('public."Customers"')
```
So it's effectively SQL in SQL.
The new query is much more straightforward and isn't prone to this issue. It
just selects the number of rows for the table whose schema name is `public` and
name is `Customers` by avoiding the "SQL in SQL" situation.
It is possible to make the old query work, but it would involve building SQL
by using `quote()` and then parsing via `to_regclass()` which is unnecessary.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]