Vino1016 opened a new issue, #15215:
URL: https://github.com/apache/iceberg/issues/15215

   ### Query engine
   
   engine: Flink 2.0.0
   iceberg: 1.10.0
   
   ### Question
   
   Description:
   Hi, I’m encountering a performance warning in Iceberg:
   "Performance degraded as records with different schema are generated for the 
same table. Likely the DynamicRecord.schema is not reused. Reuse the same 
instance if the record schema is the same to improve performance."
   This warning occurs for every data record.
   
   My Code Logic:
   
   I process upstream data in a process function. The schema for the test data 
is constant and does not change. Within this process, I construct a 
DynamicRecord for each day of data.
   
   // Data processing (simplified)
   DynamicRecord dynamicRecord = new DynamicRecord(
           tableIdentifier,
           ICEBERG_BRANCH,
           this.newestSchema,
           genericRowData,
           this.newestPartitionSpec,
           DistributionMode.NONE,
           1
   );
   dynamicRecord.setEqualityFields(xxxx.getEqualityFieldColumns());
   dynamicRecord.setUpsertMode(true);
   
   out.collect(dynamicRecord);
   
   
   The tableIdentifier, newestSchema, and newestPartitionSpec are initialized 
once in the open method and remain unchanged afterward.
   
   Sink Configuration:
   DynamicIcebergSink.forInput(dynamicRecordSingleOutputStreamOperator)
                   .cacheMaxSize(10)
                   .cacheRefreshMs(600000L)
                   .inputSchemasPerTableCacheMaxSize(10)
                   .immediateTableUpdate(true)
                   .overwrite(false)
                   .setAll(getSinkConfig(writeFormat))
                   .catalogLoader(TestS3Sink.getCatalogLoader(xxxx))
                   .generator(new DefaultDynamicRecordGenerator())
                   .append()
                   .setParallelism(1);
   
   Custom Generator:
   public class DefaultDynamicRecordGenerator implements 
DynamicRecordGenerator<DynamicRecord> {
   
       @Override
       public void open(OpenContext openContext) throws Exception {
           DynamicRecordGenerator.super.open(openContext);
       }
   
       @Override
       public void generate(DynamicRecord dynamicRecord, 
Collector<DynamicRecord> collector) throws Exception {
           collector.collect(dynamicRecord);
       }
   }
   
   Expected Behavior:
   
   Since the schema does not change and remains the same object, Iceberg should 
reuse the same DynamicRecord.schema instance, rather than treating it as a new 
schema, which causes the warning.
   
   Question:
   
   Even though we are reusing the same schema instance, could this warning 
still appear due to some internal handling by Iceberg?
   
   Apart from the way the schema is passed, are there any other factors that 
might cause Iceberg to treat it as a different schema, thus triggering this 
warning?
   
   I would appreciate any help or suggestions. Thank you!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to