Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/15382
  
    Update on my thinking:
    
    Docs say the default should be "spark-warehouse" in the local working 
directory. The original code did establish a default on the local file 
system... though in the home dir, not working dir. So I still favor fixing 
that, but that's a small, orthogonal question.
    
    Although the default appears to be intended to be on the local FS, I don't 
think it must be. @avulanov mentions it might be on S3; the Hive warehouse dir 
is on HDFS too by default, in comparison. I don't have evidence that this has 
been prohibited so I guess we shouldn't assume this is a local path.
    
    It seems simple to default to "spark-warehouse" in the local working dir of 
whatever file system is being used, be it local or HDFS. That means SQLConf 
shouldn't specify a default with any URI scheme or path. SQLConf should then 
resolve it, though, at the moment there's no good way to plumb through a Hadoop 
Configuration. (I just made a new default one in here for the moment.)
    
    Then we're back to an interesting detail, that Hadoop Path's toURI won't 
add a trailing slash in the case of a local file. Maybe it doesn't in general 
since it can't always tell whether a path is a directory or not. Local URIs 
however will have a trailing slash. It's probably best to leave the behavior 
and adjust the tests to not expect a trailing slash, as the original code 
seemed to (?)
    
    Does this still work on Windows? As far as I can tell it would.
    
    Is it OK to instantiate `FileSystem.get(new Configuration())` in `SQLConf`? 
there's no easy access to the actual conf, but, it is only trying to read the 
default file system which shouldn't depend on Spark-specific Hadoop conf.
    
    I think this change actually _restores_ the ability to specify a warehouse 
path with a URI scheme. This second, it will be interpreted as a file path only 
if you write "file:/foo". However there's no way to write a relative file URI, 
so no way to guarantee that a default of "spark-warehouse" is a relative path 
on the local FS. After this change, HDFS users would see that this correctly 
defaults to "spark-warehouse" in an HDFS working directory.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to