Qiang Zhao created AVRO-4224:
--------------------------------
Summary: `SchemaParseException` when parsing a schema generated
from Protobuf with nested classes due to `$` in the namespace.
Key: AVRO-4224
URL: https://issues.apache.org/jira/browse/AVRO-4224
Project: Apache Avro
Issue Type: Bug
Components: java
Reporter: Qiang Zhao
When using `avro-protobuf` to generate an Avro schema from a Protobuf class
that contains nested classes, the `ProtobufData.get().getSchema()` method
generates a schema that includes a `$` in the namespace of the nested class.
Starting with Avro 1.12.0, the `Schema.Parser` no longer allows `$` in
namespaces, leading to a `SchemaParseException` when trying to parse the
generated schema. This behaviour was not present in version 1.11.5 and
constitutes a breaking change.
A sample project that reproduces this issue is available at:
`https://github.com/mattisonchao/avro-schema-breaking`
**Steps to Reproduce:**
1. Define a Protobuf message with a nested message, like the one below.
`data_record.proto`
```proto
syntax = "proto3";
package io.github.mattison;
option java_package = "io.github.mattison";
option java_outer_classname = "DataRecordOuterClass";
message DataRecord {
string field1 = 1;
int64 field2 = 2;
NestedDataRecord field3 = 3;
repeated NestedDataRecord fields4 = 4;
message NestedDataRecord {
string field1 = 1;
int64 field2 = 2;
}
}
```
2. In a Java application, use
`org.apache.avro.protobuf.ProtobufData.get().getSchema()` to generate an Avro
schema from the compiled Protobuf class.
3. Attempt to parse the generated schema string using `new
Schema.Parser().parse()`. The code will fail.
`Application.java`
```java
package io.github.mattison;
import org.apache.avro.Schema;
import org.apache.avro.protobuf.ProtobufData;
public class Application {
public static void main(String[] args) {
final Schema schema =
ProtobufData.get().getSchema(DataRecordOuterClass.DataRecord.class);
final Schema.Parser parser = new Schema.Parser();
// The following line will throw an exception with avro-protobuf >= 1.12.0
parser.parse(schema.toString());
System.out.println(parser);
}
}
```
**Expected Behavior:**
The schema should be parsed successfully, as it was in Avro 1.11.5. The `$`
character in the namespace, which is automatically generated by `ProtobufData`
for nested classes, should either be handled gracefully by the parser or
avoided during schema generation.
**Actual Behavior:**
A `SchemaParseException` is thrown, preventing the schema from being parsed.
**Stack Trace:**
```
Exception in thread "main" org.apache.avro.SchemaParseException: Namespace part
"DataRecord$DataRecord" is invalid: Illegal character in: DataRecord$DataRecord
at org.apache.avro.ParseContext.validateName(ParseContext.java:241)
at org.apache.avro.ParseContext.requireValidFullName(ParseContext.java:232)
at org.apache.avro.ParseContext.put(ParseContext.java:213)
at org.apache.avro.Schema.parseRecord(Schema.java:1882)
at org.apache.avro.Schema.parse(Schema.java:1836)
at org.apache.avro.Schema.parseUnion(Schema.java:1972)
at org.apache.avro.Schema.parse(Schema.java:1849)
at org.apache.avro.Schema.parseField(Schema.java:1892)
at org.apache.avro.Schema.parseRecord(Schema.java:1872)
at org.apache.avro.Schema.parse(Schema.java:1836)
at org.apache.avro.Schema$Parser.parse(Schema.java:1539)
at org.apache.avro.Schema$Parser.parse(Schema.java:1516)
at io.github.mattison.Application.main(Application.java:13)
```
This issue blocks the upgrade path for users who rely on Avro's Protobuf
compatibility and have nested message definitions.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)