Copilot commented on code in PR #25193:
URL: https://github.com/apache/pulsar/pull/25193#discussion_r2740428978


##########
pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/validator/StructSchemaDataValidator.java:
##########
@@ -97,4 +100,30 @@ private static void 
throwInvalidSchemaDataException(SchemaData schemaData,
         throw new InvalidSchemaDataException("Invalid schema definition data 
for "
             + schemaData.getType() + " schema", cause);
     }
+
+    static class CompatibleNameValidator implements NameValidator {
+
+        @Override
+        public Result validate(String name) {
+            if (name == null) {
+                return new Result("Null name");
+            }
+            final int length = name.length();
+            if (length == 0) {
+                return new Result("Empty name");
+            }
+            final char first = name.charAt(0);
+            if (!(Character.isLetter(first) || first == '_' || first == '$')) {
+                return new Result("Illegal initial character: " + name);
+            }
+            for (int i = 1; i < length; i++) {
+                final char c = name.charAt(i);
+                // we need to allow $ for the special case
+                if (!(Character.isLetterOrDigit(c) || c == '_' || c == '$')) {
+                    return new Result("Illegal character in: " + name);

Review Comment:
   `CompatibleNameValidator` currently uses 
`Character.isLetter/isLetterOrDigit`, which treats many Unicode characters 
(e.g. Chinese) as letters. That means names like "名字" will validate 
successfully, but the added tests expect them to be rejected and Avro’s name 
spec is ASCII-based. Please tighten the allowed character set to ASCII (e.g. 
`[A-Za-z_$][A-Za-z0-9_$]*`) so behavior matches the intended compatibility 
scope (allow `$` but not arbitrary Unicode identifiers).



##########
pulsar-broker/src/test/java/org/apache/pulsar/broker/service/schema/validator/SchemaDataValidatorTest.java:
##########
@@ -23,14 +23,22 @@
 import com.fasterxml.jackson.databind.ObjectReader;
 import com.fasterxml.jackson.module.jsonSchema.JsonSchema;
 import com.fasterxml.jackson.module.jsonSchema.JsonSchemaGenerator;
+import org.apache.avro.protobuf.ProtobufData;
 import 
org.apache.pulsar.broker.service.schema.exceptions.InvalidSchemaDataException;
+import org.apache.pulsar.broker.service.schema.proto.DataRecordOuterClass;
 import org.apache.pulsar.client.api.Schema;
+import org.apache.pulsar.client.impl.schema.ProtobufSchema;
 import org.apache.pulsar.common.protocol.schema.SchemaData;
+import org.apache.pulsar.common.schema.SchemaInfo;
 import org.apache.pulsar.common.schema.SchemaType;
 import org.apache.pulsar.common.util.ObjectMapperFactory;
+import org.junit.jupiter.api.Assertions;
 import org.testng.annotations.DataProvider;
 import org.testng.annotations.Test;

Review Comment:
   This test class uses TestNG annotations (`org.testng.annotations.*`) but the 
new assertions import `org.junit.jupiter.api.Assertions`. The broker module 
appears to standardize on TestNG (`org.testng.Assert`) and doesn’t include 
JUnit 5 dependencies, so this import may fail compilation. Please switch these 
assertions to TestNG (`org.testng.Assert` or static imports) and remove the 
JUnit Jupiter import.



##########
pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/validator/StructSchemaDataValidator.java:
##########
@@ -49,7 +52,7 @@ public void validate(SchemaData schemaData) throws 
InvalidSchemaDataException {
         byte[] data = schemaData.getData();
 
         try {
-            Schema.Parser avroSchemaParser = new Schema.Parser();
+            Schema.Parser avroSchemaParser = new 
Schema.Parser(COMPATIBLE_NAME_VALIDATOR);
             avroSchemaParser.setValidateDefaults(false);

Review Comment:
   This change fixes schema validation by using `CompatibleNameValidator`, but 
other broker-side Avro parsing paths (e.g. schema compatibility checks) still 
construct `new Schema.Parser()` with the default validator. If existing 
Protobuf-derived Avro schemas contain `$` in record names, compatibility checks 
may still fail even though validation now passes. Consider reusing the same 
validator (or `NameValidator.NO_VALIDATION`) everywhere the broker parses 
stored schema JSON for PROTOBUF/AVRO/JSON-compat paths to fully address the 
regression.



##########
pulsar-broker/src/test/java/org/apache/pulsar/broker/service/schema/validator/SchemaDataValidatorTest.java:
##########
@@ -23,14 +23,22 @@
 import com.fasterxml.jackson.databind.ObjectReader;
 import com.fasterxml.jackson.module.jsonSchema.JsonSchema;
 import com.fasterxml.jackson.module.jsonSchema.JsonSchemaGenerator;
+import org.apache.avro.protobuf.ProtobufData;
 import 
org.apache.pulsar.broker.service.schema.exceptions.InvalidSchemaDataException;
+import org.apache.pulsar.broker.service.schema.proto.DataRecordOuterClass;
 import org.apache.pulsar.client.api.Schema;
+import org.apache.pulsar.client.impl.schema.ProtobufSchema;
 import org.apache.pulsar.common.protocol.schema.SchemaData;
+import org.apache.pulsar.common.schema.SchemaInfo;

Review Comment:
   Unused imports were added here (`org.apache.avro.protobuf.ProtobufData`, 
`org.apache.pulsar.common.schema.SchemaInfo`). If the build uses 
checkstyle/spotless, this will fail. Please remove unused imports.
   ```suggestion
   import 
org.apache.pulsar.broker.service.schema.exceptions.InvalidSchemaDataException;
   import org.apache.pulsar.broker.service.schema.proto.DataRecordOuterClass;
   import org.apache.pulsar.client.api.Schema;
   import org.apache.pulsar.client.impl.schema.ProtobufSchema;
   import org.apache.pulsar.common.protocol.schema.SchemaData;
   ```



##########
pulsar-broker/src/main/proto/DataRecord.proto:
##########
@@ -0,0 +1,17 @@
+syntax = "proto3";
+
+package pulsar.schema;
+option java_package = "org.apache.pulsar.broker.service.schema.proto";
+

Review Comment:
   New proto files in this module typically include the ASF license header (see 
e.g. `pulsar-broker/src/main/proto/SchemaRegistryFormat.proto`). Please add the 
standard license header comment block at the top of this file to match project 
conventions and avoid licensing/audit issues.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to