firasomrane opened a new issue, #14077:
URL: https://github.com/apache/iceberg/issues/14077

   ### Feature Request / Improvement
   
   ## Description:
   
   - Problem: Today CommitReport exposes overall commit 
[total-duration](https://github.com/apache/iceberg/blob/4f06b0911e870ad4ef329e77d68ae5ebe9135961/core/src/main/java/org/apache/iceberg/metrics/CommitMetrics.java#L28)
 and counters, but there’s no visibility into the time spent constructing 
on-disk metadata files as part of the new snapshot, which is often a 
significant portion of commit latency that I am interested in monitoring to 
capture problems with big metadata files
   
   Specifically:
   - Time to build/write the manifest-list files.
   - Time to serialize and write the new metadata.json.
   
   
   ### Ask: Add optional timers to CommitReport metrics to break down commit 
time by:
   - manifest-list-build-duration
   - metadata-json-write-duration
   
   ### Proposal (high level):
   - Add two optional timers to 
[CommitMetrics](https://github.com/apache/iceberg/blob/4f06b0911e870ad4ef329e77d68ae5ebe9135961/core/src/main/java/org/apache/iceberg/metrics/CommitMetrics.java#L27)
 and 
[CommitMetricsResult](https://github.com/apache/iceberg/blob/4f06b0911e870ad4ef329e77d68ae5ebe9135961/core/src/main/java/org/apache/iceberg/metrics/CommitMetricsResult.java#L61):
   - manifest-list-build-duration (nanoseconds)
   - metadata-json-write-duration (nanoseconds)
   
   Instrument the commit path to measure:
   - Manifest list build time [inside 
SnapshotProducer.apply()](https://github.com/apache/iceberg/blob/be577eeac631d77243beb57409e476bf197f79d7/core/src/main/java/org/apache/iceberg/SnapshotProducer.java#L265-L286).
   - Table metadata write time inside `TableOperations.commit(...)` 
implementations (Hadoop, JDBC, REST).
   - Include these timers in CommitReport → metrics JSON. Backward compatible 
since metrics are a map.
   
   ## Implementation details
   ### Manifest List Build time: where to add the timer.
   ```
   public Snapshot apply() {
     refresh();
     Snapshot parentSnapshot = SnapshotUtil.latestSnapshot(base, targetBranch);
   
     validate(base, parentSnapshot);
     List<ManifestFile> manifests = apply(base, parentSnapshot);
   
     OutputFile manifestList = manifestListPath();
     // Start timer for manifest list build time
     ManifestListWriter writer =
         ManifestLists.write(
             ops.current().formatVersion(),
             manifestList,
             snapshotId(),
             parentSnapshotId,
             sequenceNumber,
             base.nextRowId());
   
     try (writer) {
       manifestLists.add(manifestList.location());
       ManifestFile[] manifestFiles = new ManifestFile[manifests.size()];
       Tasks.range(manifestFiles.length)
           .executeWith(workerPool())
           .run(index -> manifestFiles[index] = 
manifestsWithMetadata.get(manifests.get(index)));
       writer.addAll(Arrays.asList(manifestFiles));
       // End timer for manifest list build time
     } catch (IOException e) {
       throw new RuntimeIOException(e, "Failed to write manifest list file");
     }
   }
   ``` 
   
   ### Metadata file Build time: where to add the timer.
   - inside `TableOperations.commit(...)` implementations:
     - Hadoop, inside 
[`HadoopTableOperations.java`(]https://github.com/apache/iceberg/blob/7f14032be8c0538bfa59aba9951ec8a6001035e3/core/src/main/java/org/apache/iceberg/hadoop/HadoopTableOperations.java#L149-L163)
     - REST inside 
[`RESTTableOperations.java`](https://github.com/apache/iceberg/blob/a2b8008da7bc26e03248a35eeee60d1cc7e8499d/core/src/main/java/org/apache/iceberg/rest/RESTTableOperations.java#L116-151)
   
   
   
   ### Query engine
   
   Spark
   
   ### Willingness to contribute
   
   - [ ] I can contribute this improvement/feature independently
   - [ ] I would be willing to contribute this improvement/feature with 
guidance from the Iceberg community
   - [ ] I cannot contribute this improvement/feature at this time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to