Re: [PR] [HUDI-9511] Use local engine context for timeline server markers [hudi]

via GitHub Mon, 09 Jun 2025 19:45:31 -0700


the-other-tim-brown commented on PR #13399:
URL: https://github.com/apache/hudi/pull/13399#issuecomment-2957517297


   > > In spark, the executors will ask the timeline server to create markers 
for the files being created which in turn will launch more spark tasks if the 
spark engine context is used
   > 
   > I checked the code in `MarkerDirState` for `HoodieEngineContext` usages, 
one is in `MarkerDirState` constructor for markers sync and another is for 
delete all markers, both of these should happen only once because the 
`MarkerDirState` itself is a singleton on timeline server? And these 2 should 
not be called while creating markers for data files?
   
   The `MarkerDirState` is not a singleton, see that there is a 
[map](https://github.com/apache/hudi/blob/master/hudi-timeline-service/src/main/java/org/apache/hudi/timeline/service/handlers/MarkerHandler.java#L92)
 of these. We will initialize these entries in `getMarkerDirState` which is 
called from the path that creates the markers which is called when we create 
the data files.
   
   In general, using the spark executors to do basic tasks like list a 
directory and apply a basic function on the file statuses and then delete those 
files is too much overhead. When you are running on a spark cluster with other 
tasks running, you will wait for an executor to become available (FIFO by 
default) just to do these basic operations.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [HUDI-9511] Use local engine context for timeline server markers [hudi]

Reply via email to