voonhous opened a new issue, #18964: URL: https://github.com/apache/hudi/issues/18964
### Describe the problem `HoodieMetadataPayload#createRecordIndexUpdate` calls `TimelineUtils.parseDateFromInstantTime(instantTime).getTime()` for every record, even though the instant time is the same string for the entire commit. The parse runs a string-slicing compatibility fixup plus `LocalDateTime.parse` with a `DateTimeFormatter` per call. Per-record callers include RLI record generation for base files, revived keys, and record-index initialization. The read side mirrors it: `HoodieTableMetadataUtil#getLocationFromRecordIndexInfo` runs `new Date(...)` plus `HoodieInstantTimeGenerator.formatDate` per looked-up record during record-index lookups, although the set of distinct instant times is tiny (one per commit). For a 10M-record commit this is roughly 10M redundant date parses on the write side and the same again per upsert-tagging lookup on the read side. ### Proposed fix - Add an overload `createRecordIndexUpdate(recordKey, partition, fileId, instantTimeMillis, fileIdEncoding)` and keep the String overload delegating to it after a single parse; batch callers parse once outside their per-record loops. - In `getLocationFromRecordIndexInfo`, memoize the millis-to-instant-string formatting with a small bounded cache keyed by the millis value. Output records and decoded locations are unchanged. Will raise a PR for this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
