danny0405 commented on code in PR #18967:
URL: https://github.com/apache/hudi/pull/18967#discussion_r3426509659


##########
hudi-common/src/main/java/org/apache/hudi/common/schema/HoodieSchema.java:
##########
@@ -348,16 +352,32 @@ private HoodieSchema(Schema avroSchema, 
List<HoodieSchemaField> fields) {
     this.fields = fields != null ? Collections.unmodifiableList(fields) : null;
   }
 
+  // Avro schemas are interned by identity (records of one file share the same 
Schema instance), so the
+  // per-record fromAvroSchema() call is a cache hit that reuses the canonical 
HoodieSchema and its lazily
+  // built field list / field map instead of rebuilding an O(schema width) 
wrapper. Misses convert and
+  // value-intern through HoodieSchemaCache so equal-but-distinct Avro schema 
instances still converge on
+  // one canonical HoodieSchema. Global cache for the JVM lifecycle; weakKeys 
lets dead schemas be GC'd.
+  private static final LoadingCache<Schema, HoodieSchema> AVRO_SCHEMA_CACHE =
+      Caffeine.newBuilder().weakKeys().maximumSize(1024)
+          .build(avroSchema -> 
HoodieSchemaCache.intern(convertFromAvroSchema(avroSchema)));
+
   /**
-   * Factory method to create HoodieSchema from an Avro schema.
+   * Factory method to create a {@link HoodieSchema} from an Avro schema.
+   *
+   * <p>The result is interned: passing the same Avro {@link Schema} instance 
(e.g. once per record)
+   * returns the canonical {@link HoodieSchema} rather than rebuilding a fresh 
wrapper each call.
    *
    * @param avroSchema the Avro schema to wrap
-   * @return new HoodieSchema instance
+   * @return canonical HoodieSchema instance, or {@code null} if {@code 
avroSchema} is null
    */
   public static HoodieSchema fromAvroSchema(Schema avroSchema) {
     if (avroSchema == null) {
       return null;
     }
+    return AVRO_SCHEMA_CACHE.get(avroSchema);

Review Comment:
   what's the difference, lookes like just some code moving around?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to