nsivabalan commented on code in PR #11514: URL: https://github.com/apache/hudi/pull/11514#discussion_r1663010943
########## rfc/rfc-78/rfc-78.md: ########## @@ -179,6 +183,9 @@ Let’s reiterate what we need to support w/ 0.16.0 reader. On a high level, we need to ensure commit metadata in either format (avro or json) need to be supported. And “cluster” and completed “compaction”s need to be readable in 0.16.0 reader. - But the challenging part is, for every commit metadata, we might have to deserialize to avro and on exception try json. We could deduce the format using completion file name, but as per current code layering, deserialization methods does not know the file name( method takes byte[]). - Similarly for clustering commits, unless we have some kind of watermark, we have to keep considering replace commits as well in the FSV building logic to ensure we do not miss any clustering commits. +- To be decided: We also need to use diff LogFileComparators depending on the file slice's base instant time. If the file slices's base instant time is < table upgrade commit time, we use older log file comparator to order log files. but if file slice's base instant time > table upgrade commit time, we have to use new log file comparator (completion time). Tricky part is if a file slice contains a mix of log files. + This fix definitely needs to go into 1.x, but whether we wanted to port this change to 0.16.x or not is yet to be discussed and decided. Lets zoom in a bit to see what will happen if a single file slice could contain a mix of log files using 1.x reader(this is a basic requirement to support 0.16.x tables in 1.x). Review Comment: we need to fix 1.x reader to enforce completion time based log file ordering for file slice. after the fix, from our understanding, same logic should work for a file slice completely written in 0.x. bcoz, completion time will match for all log files. and then we should use log version to determine the ordering. we need to have lot of tests covering all these scenarios. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org