danny0405 commented on code in PR #18967:
URL: https://github.com/apache/hudi/pull/18967#discussion_r3426385500
##########
hudi-common/src/main/java/org/apache/hudi/common/schema/HoodieSchema.java:
##########
@@ -348,16 +352,32 @@ private HoodieSchema(Schema avroSchema,
List<HoodieSchemaField> fields) {
this.fields = fields != null ? Collections.unmodifiableList(fields) : null;
}
+ // Avro schemas are interned by identity (records of one file share the same
Schema instance), so the
+ // per-record fromAvroSchema() call is a cache hit that reuses the canonical
HoodieSchema and its lazily
+ // built field list / field map instead of rebuilding an O(schema width)
wrapper. Misses convert and
+ // value-intern through HoodieSchemaCache so equal-but-distinct Avro schema
instances still converge on
+ // one canonical HoodieSchema. Global cache for the JVM lifecycle; weakKeys
lets dead schemas be GC'd.
+ private static final LoadingCache<Schema, HoodieSchema> AVRO_SCHEMA_CACHE =
+ Caffeine.newBuilder().weakKeys().maximumSize(1024)
+ .build(avroSchema ->
HoodieSchemaCache.intern(convertFromAvroSchema(avroSchema)));
+
/**
- * Factory method to create HoodieSchema from an Avro schema.
+ * Factory method to create a {@link HoodieSchema} from an Avro schema.
+ *
+ * <p>The result is interned: passing the same Avro {@link Schema} instance
(e.g. once per record)
+ * returns the canonical {@link HoodieSchema} rather than rebuilding a fresh
wrapper each call.
*
* @param avroSchema the Avro schema to wrap
- * @return new HoodieSchema instance
+ * @return canonical HoodieSchema instance, or {@code null} if {@code
avroSchema} is null
*/
public static HoodieSchema fromAvroSchema(Schema avroSchema) {
if (avroSchema == null) {
return null;
}
+ return AVRO_SCHEMA_CACHE.get(avroSchema);
Review Comment:
looks like we got a lot of test failures, not sure if there are some
thread-safety issues.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]