Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/18979 Noted :) @dongjoon-hyun : is the issue with ORC that if there's nothing to write, it doesn't generate a file (so avoiding that issue with sometimes you get 0-byte ORC files & things downstream fail)? If so, the warning message which @gatorsmile has proposed is potentially going to mislead people into worrying about a problem which isn't there. and the numFiles metric is going to mislead. I'm starting to worry about how noisy the log would be, both there and when working with s3 when it's playing delayed visibility (rarer). 1. What if this patch just logged at debug: less noise, but still something there if people are trying to debug a mismatch? 1. if there's no file found, numFiles doesn't get incremented. 1. I count the number of files actually submitted 1. And in `getFinalStats()` log @ info if there is a mismatch This would line things up in future for actually returning the list of expected vs actual files up as a metric where it could be reported.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org