Github user mridulm commented on the issue:

    https://github.com/apache/spark/pull/19487
  
    @HyukjinKwon That was exactly the initial solution I tested locally when we 
saw the problem with Phoenix.
    The reason to expand it was two fold:
    a) This change preserves existing behavior from 2.1
    b) There are three cases where `new Path` can throw exception, and I would 
be special case'ing only two of them (null and empty string).
    
    My understanding is that hadoop does not require to create a Path out of 
`mapred.output.dir`/`mapreduce.output.fileoutputformat.outputdir` - it is upto 
the individual OutputFormat/Committer to handle it appropriately (unless I am 
missing something here - @steveloughran would be able to opine better on this).
    
    Expectation of 
`mapred.output.dir`/`mapreduce.output.fileoutputformat.outputdir` being a valid 
path is a behavior change we introduced in 2.2 in 
`HadoopMapReduceCommitProtocol` : since we are now directly handling promotion 
of files to final location for some cases in `commitJob` or cleanup in 
`abortJob` : this is done in addition to what is done by committer (which is 
invoked before our code).
    
    For committers which are not `Path` based, this promotion of output by 
spark does not apply (there is no path to promote !) - which is what @szhem's 
patch was fixing - except it was handling only null.
    
    
    Having said all this - if it is the expectation in hadoop that 
`mapred.output.dir`/`mapreduce.output.fileoutputformat.outputdir` , when 
specified, should be a valid `Path` - I will hapilly change it to special 
case'ing only for `null` and `""`.
    Unless we get that clarity, IMO we should preserve behavior and be 
defensive about when we try to do manual promotion.
    
    Please note that if invalid value is being provided for 
`mapred.output.dir`/`mapreduce.output.fileoutputformat.outputdir` - the 
corresponding output format or committer will throw approrpriate exception 
(like in case of MR or pig or hive).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to