gemini-code-assist[bot] commented on code in PR #38167:
URL: https://github.com/apache/beam/pull/38167#discussion_r3075606168


##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIO.java:
##########
@@ -2754,6 +2774,44 @@ static String resolveSpannerProjectId(SpannerConfig 
config) {
         : config.getProjectId().get();
   }
 
+  @VisibleForTesting
+  static void checkTvfExistence(DatabaseClient databaseClient, List<String> 
tvfNameList) {
+    if (tvfNameList == null || tvfNameList.isEmpty()) {
+      return;
+    }
+    try (ReadOnlyTransaction tx = databaseClient.readOnlyTransaction()) {
+      StringBuilder inClauseBuilder = new StringBuilder();
+      for (int i = 0; i < tvfNameList.size(); i++) {
+        inClauseBuilder
+            .append("'")
+            .append(PartitionMetadataDao.escapeTvfName(tvfNameList.get(i)))
+            .append("'");
+        if (i < tvfNameList.size() - 1) {
+          inClauseBuilder.append(", ");
+        }
+      }
+      String inClause = inClauseBuilder.toString();
+      // Note routine_type for GSQL is 'TABLE FUNCTION', for PostgreSQL is 
'FUNCTION'.
+      String query =
+          "SELECT routine_name FROM information_schema.routines WHERE 
routine_type LIKE '%FUNCTION' AND routine_name IN ("
+              + inClause
+              + ")";
+
+      Statement statement = Statement.newBuilder(query).build();
+      ResultSet resultSet = tx.executeQuery(statement);

Review Comment:
   ![medium](https://www.gstatic.com/codereviewagent/medium-priority.svg)
   
   Building SQL queries using string concatenation is a security risk and can 
lead to SQL injection. It is also less efficient and more error-prone than 
using query parameters. Consider using a parameterized query with the `IN` 
clause. This approach is safer and dialect-neutral.
   
   ```java
         StringBuilder sql = new StringBuilder("SELECT routine_name FROM 
information_schema.routines WHERE routine_type LIKE '%FUNCTION' AND 
routine_name IN (");
         for (int i = 0; i < tvfNameList.size(); i++) {
           sql.append("@p").append(i);
           if (i < tvfNameList.size() - 1) {
             sql.append(", ");
           }
         }
         sql.append(")");
         Statement.Builder builder = Statement.newBuilder(sql.toString());
         for (int i = 0; i < tvfNameList.size(); i++) {
           builder.bind("p" + i).to(tvfNameList.get(i));
         }
         Statement statement = builder.build();
         ResultSet resultSet = tx.executeQuery(statement);
   ```



##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/changestreams/dao/ChangeStreamDao.java:
##########
@@ -95,10 +98,14 @@ public ChangeStreamResultSet changeStreamQuery(
     if (this.isPostgres()) {
       // Ensure we have determined whether change stream uses mutable key range
       if (this.isMutableChangeStream) {
-        query =
-            "SELECT * FROM \"spanner\".\"read_proto_bytes_"
-                + changeStreamName
-                + "\"($1, $2, $3, $4, null)";
+        if (tvfName == null || tvfName.equals(DEFAULT_TVF_NAME)) {
+          query =
+              "SELECT * FROM \"spanner\".\"read_proto_bytes_"
+                  + changeStreamName
+                  + "\"($1, $2, $3, $4, null)";
+        } else {
+          query = "SELECT * FROM \"spanner\".\"" + tvfName + "\"($1, $2, $3, 
$4, null)";

Review Comment:
   ![high](https://www.gstatic.com/codereviewagent/high-priority.svg)
   
   Directly concatenating `tvfName` into the SQL query string is a security 
risk (SQL injection). While TVF names cannot be parameterized in the `FROM` 
clause, they should be strictly validated or properly quoted using the database 
dialect's identifier quoting rules (e.g., double quotes for PostgreSQL) to 
ensure safety and correctness.



##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/changestreams/dao/ChangeStreamDao.java:
##########
@@ -117,16 +124,29 @@ public ChangeStreamResultSet changeStreamQuery(
               .to(heartbeatMillis)
               .build();
     } else {
-      query =
-          "SELECT * FROM READ_"
-              + changeStreamName
-              + "("
-              + "   start_timestamp => @startTimestamp,"
-              + "   end_timestamp => @endTimestamp,"
-              + "   partition_token => @partitionToken,"
-              + "   read_options => null,"
-              + "   heartbeat_milliseconds => @heartbeatMillis"
-              + ")";
+      if (this.isMutableChangeStream && tvfName != null && 
!tvfName.equals(DEFAULT_TVF_NAME)) {
+        query =
+            "SELECT * FROM "
+                + tvfName

Review Comment:
   ![high](https://www.gstatic.com/codereviewagent/high-priority.svg)
   
   Directly concatenating `tvfName` into the SQL query string is a security 
risk (SQL injection). For GoogleSQL dialect, identifiers should be wrapped in 
backticks and properly escaped if they contain backticks themselves. Since the 
TVF name is validated against `information_schema` at pipeline construction, 
the risk is reduced, but it remains a poor practice that can lead to issues 
with special characters in identifiers.



##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/changestreams/dao/PartitionMetadataDao.java:
##########
@@ -132,14 +134,41 @@ public List<String> findAllTableIndexes() {
     return result;
   }
 
+  public static String escapeTvfName(String tvfName) {
+    return tvfName.replace("'", "").replace("\"", "");
+  }

Review Comment:
   ![medium](https://www.gstatic.com/codereviewagent/medium-priority.svg)
   
   The `escapeTvfName` method is destructive as it removes all single and 
double quotes from the TVF name. This will cause validation and queries to fail 
if a TVF name legitimately contains these characters (e.g., delimited 
identifiers in Spanner). Instead of removing these characters, the code should 
handle them by using proper quoting when the name is used in SQL queries. 
Additionally, this simple replacement is not a robust way to prevent SQL 
injection.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to