voonhous commented on code in PR #18965:
URL: https://github.com/apache/hudi/pull/18965#discussion_r3407501898
##########
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataPayload.java:
##########
@@ -672,8 +697,8 @@ public static HoodieRecord<HoodieMetadataPayload>
createRecordIndexUpdate(String
fileIndex = Integer.parseInt(fileId.substring(index + 1));
}
} catch (Exception e) {
- throw new HoodieMetadataException(String.format("Invalid UUID or
index: fileID=%s, partition=%s, instantTime=%s",
- fileId, partition, instantTime), e);
+ throw new HoodieMetadataException(String.format("Invalid UUID or
index: fileID=%s, partition=%s, instantTimeMillis=%d",
Review Comment:
Can we keep `instantTimeMillis`? Our initial worry was that timezone issues
will cause inconsistencies. Sure, i removed that, but we still a way to
attribute which instant this filegroup belongs to right? WIthout the instant,
this error message is useless.
The attribution is trivial. Users can match via the minutes, seconds and
milliseconds field after converting the `instantTimeMillis` to whatever
timezone anyways. The odds of clashing should not be THAT high after looking at
a filegroup.
##########
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataPayload.java:
##########
@@ -672,8 +697,8 @@ public static HoodieRecord<HoodieMetadataPayload>
createRecordIndexUpdate(String
fileIndex = Integer.parseInt(fileId.substring(index + 1));
}
} catch (Exception e) {
- throw new HoodieMetadataException(String.format("Invalid UUID or
index: fileID=%s, partition=%s, instantTime=%s",
- fileId, partition, instantTime), e);
+ throw new HoodieMetadataException(String.format("Invalid UUID or
index: fileID=%s, partition=%s, instantTimeMillis=%d",
Review Comment:
As mentioned, if you want to clean up the error message, fire another PR to
fix things and justify why. This is a pure performance improvement PR, let's
scope our changes that way.
The demotion from actual `instantTime` to `instantTimeMillis` is a
reasonable sacrifice given that your concern is JVM timezone differences. I
would argue that it actually gives us more information as a timezone agnostic
value is printed out. This removes alot of ambiguity, and if there really are
timezone issues, devs can put 2 and 2 together and infer it themselves.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]