voonhous commented on code in PR #18967:
URL: https://github.com/apache/hudi/pull/18967#discussion_r3400948354


##########
hudi-common/src/main/java/org/apache/hudi/avro/AvroRecordContext.java:
##########
@@ -70,7 +71,10 @@ public AvroRecordContext() {
   public static Object getFieldValueFromIndexedRecord(
       IndexedRecord record,
       String fieldName) {
-    HoodieSchema currentSchema = 
HoodieSchema.fromAvroSchema(record.getSchema());
+    // Interning returns the canonical wrapper for this schema, whose lazily 
built field list and
+    // field map survive across calls, so the per-record cost is a cache hit 
instead of an
+    // O(schema width) wrapper rebuild.
+    HoodieSchema currentSchema = 
HoodieSchemaCache.intern(HoodieSchema.fromAvroSchema(record.getSchema()));

Review Comment:
   Agreed, added `HoodieSchemaCache.intern(Schema avroSchema)`: a weak 
identity-keyed Caffeine view in front of the existing value-interning cache. 
Records of one file share the same Avro `Schema` instance, so the per-record 
path is now a single identity-based cache hit with no wrapper allocation or 
logical-type dispatch; a miss converts via `fromAvroSchema` once and then 
value-interns, so equal but distinct Avro schema instances still converge on 
the same canonical `HoodieSchema`. The weak keys also mean entries are 
collected once the underlying Avro schema is no longer referenced.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to