gemini-code-assist[bot] commented on code in PR #38167:
URL: https://github.com/apache/beam/pull/38167#discussion_r3075606168
##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIO.java:
##########
@@ -2754,6 +2774,44 @@ static String resolveSpannerProjectId(SpannerConfig
config) {
: config.getProjectId().get();
}
+ @VisibleForTesting
+ static void checkTvfExistence(DatabaseClient databaseClient, List<String>
tvfNameList) {
+ if (tvfNameList == null || tvfNameList.isEmpty()) {
+ return;
+ }
+ try (ReadOnlyTransaction tx = databaseClient.readOnlyTransaction()) {
+ StringBuilder inClauseBuilder = new StringBuilder();
+ for (int i = 0; i < tvfNameList.size(); i++) {
+ inClauseBuilder
+ .append("'")
+ .append(PartitionMetadataDao.escapeTvfName(tvfNameList.get(i)))
+ .append("'");
+ if (i < tvfNameList.size() - 1) {
+ inClauseBuilder.append(", ");
+ }
+ }
+ String inClause = inClauseBuilder.toString();
+ // Note routine_type for GSQL is 'TABLE FUNCTION', for PostgreSQL is
'FUNCTION'.
+ String query =
+ "SELECT routine_name FROM information_schema.routines WHERE
routine_type LIKE '%FUNCTION' AND routine_name IN ("
+ + inClause
+ + ")";
+
+ Statement statement = Statement.newBuilder(query).build();
+ ResultSet resultSet = tx.executeQuery(statement);
Review Comment:

Building SQL queries using string concatenation is a security risk and can
lead to SQL injection. It is also less efficient and more error-prone than
using query parameters. Consider using a parameterized query with the `IN`
clause. This approach is safer and dialect-neutral.
```java
StringBuilder sql = new StringBuilder("SELECT routine_name FROM
information_schema.routines WHERE routine_type LIKE '%FUNCTION' AND
routine_name IN (");
for (int i = 0; i < tvfNameList.size(); i++) {
sql.append("@p").append(i);
if (i < tvfNameList.size() - 1) {
sql.append(", ");
}
}
sql.append(")");
Statement.Builder builder = Statement.newBuilder(sql.toString());
for (int i = 0; i < tvfNameList.size(); i++) {
builder.bind("p" + i).to(tvfNameList.get(i));
}
Statement statement = builder.build();
ResultSet resultSet = tx.executeQuery(statement);
```
##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/changestreams/dao/ChangeStreamDao.java:
##########
@@ -95,10 +98,14 @@ public ChangeStreamResultSet changeStreamQuery(
if (this.isPostgres()) {
// Ensure we have determined whether change stream uses mutable key range
if (this.isMutableChangeStream) {
- query =
- "SELECT * FROM \"spanner\".\"read_proto_bytes_"
- + changeStreamName
- + "\"($1, $2, $3, $4, null)";
+ if (tvfName == null || tvfName.equals(DEFAULT_TVF_NAME)) {
+ query =
+ "SELECT * FROM \"spanner\".\"read_proto_bytes_"
+ + changeStreamName
+ + "\"($1, $2, $3, $4, null)";
+ } else {
+ query = "SELECT * FROM \"spanner\".\"" + tvfName + "\"($1, $2, $3,
$4, null)";
Review Comment:

Directly concatenating `tvfName` into the SQL query string is a security
risk (SQL injection). While TVF names cannot be parameterized in the `FROM`
clause, they should be strictly validated or properly quoted using the database
dialect's identifier quoting rules (e.g., double quotes for PostgreSQL) to
ensure safety and correctness.
##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/changestreams/dao/ChangeStreamDao.java:
##########
@@ -117,16 +124,29 @@ public ChangeStreamResultSet changeStreamQuery(
.to(heartbeatMillis)
.build();
} else {
- query =
- "SELECT * FROM READ_"
- + changeStreamName
- + "("
- + " start_timestamp => @startTimestamp,"
- + " end_timestamp => @endTimestamp,"
- + " partition_token => @partitionToken,"
- + " read_options => null,"
- + " heartbeat_milliseconds => @heartbeatMillis"
- + ")";
+ if (this.isMutableChangeStream && tvfName != null &&
!tvfName.equals(DEFAULT_TVF_NAME)) {
+ query =
+ "SELECT * FROM "
+ + tvfName
Review Comment:

Directly concatenating `tvfName` into the SQL query string is a security
risk (SQL injection). For GoogleSQL dialect, identifiers should be wrapped in
backticks and properly escaped if they contain backticks themselves. Since the
TVF name is validated against `information_schema` at pipeline construction,
the risk is reduced, but it remains a poor practice that can lead to issues
with special characters in identifiers.
##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/changestreams/dao/PartitionMetadataDao.java:
##########
@@ -132,14 +134,41 @@ public List<String> findAllTableIndexes() {
return result;
}
+ public static String escapeTvfName(String tvfName) {
+ return tvfName.replace("'", "").replace("\"", "");
+ }
Review Comment:

The `escapeTvfName` method is destructive as it removes all single and
double quotes from the TVF name. This will cause validation and queries to fail
if a TVF name legitimately contains these characters (e.g., delimited
identifiers in Spanner). Instead of removing these characters, the code should
handle them by using proper quoting when the name is used in SQL queries.
Additionally, this simple replacement is not a robust way to prevent SQL
injection.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]