[ https://issues.apache.org/jira/browse/SPARK-23641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrei Badea updated SPARK-23641: --------------------------------- Priority: Minor (was: Major) > Wrong username when making relative path to Hive LOAD DATA absolute > ------------------------------------------------------------------- > > Key: SPARK-23641 > URL: https://issues.apache.org/jira/browse/SPARK-23641 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.1.0 > Reporter: Andrei Badea > Priority: Minor > > We have an application deployed in yarn-cluster mode. > At some point, the application invokes > {noformat} > spark.sql("LOAD DATA INPATH some/relative/path ...") > {noformat} > in an attempt to add that directory to a Hive table. The relative path should > be interpreted relatively to the home directory of the user who ran the Spark > application (this is what the Hive shell does). > The command runs without failing, but the directory is not added to the > table. Investigation showed that > {{org.apache.spark.sql.execution.command.LoadDataCommand}} attempts to make > the path absolute by prepending > {{s"/user/${System.getProperty("user.name")}"}}. Since the application was > deployed in yarn-cluster mode, the value of the {{user.name}} property is > "yarn". This is illustrated by the following message in the driver logs: > {noformat} > INFO metadata.Hive: No sources specified to move: > hdfs://namenode:8020/user/yarn/some/relative/path{noformat} > Interestingly, the same Spark application writes the data to the relative > path (prior to calling LOAD DATA), and that makes the path absolute as > expected. It uses {{Path.makeQualified()}}, which makes the path relative > against {{FileSystem.getWorkingDirectory}}, which by default is > {{FileSystem.getHomeDirectory}} (and that apparently initializes early enough > – on the machine on which the application is submitted). -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org