Github user steveloughran commented on the issue:

    https://github.com/apache/spark/pull/18979
  
    Noted :)
    @dongjoon-hyun : is the issue with ORC that if there's nothing to write, it 
doesn't generate a file (so avoiding that issue with sometimes you get 0-byte 
ORC files & things downstream fail)?
    
    If so, the warning message which @gatorsmile has proposed is potentially 
going to mislead people into worrying about a problem which isn't there. and 
the numFiles metric is going to mislead.
    
    I'm starting to worry about how noisy the log would be, both there and when 
working with s3 when it's playing delayed visibility (rarer).
    
    1. What if this patch just logged at debug: less noise, but still something 
there if people are trying to debug a mismatch?
    1. if there's no file found, numFiles doesn't get incremented. 
    1. I count the number of files actually submitted
    1. And in `getFinalStats()` log @ info if there is a mismatch
    
    This would line things up in future for actually returning the list of 
expected vs actual files up as a metric where it could be reported.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to