[ https://issues.apache.org/jira/browse/SPARK-40286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17605509#comment-17605509 ]
Drew commented on SPARK-40286: ------------------------------ [https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Loadingfilesintotables] If the keyword LOCAL is _not_ specified, then Hive will either use the full URI of {_}filepath{_}, if one is specified, or will apply the following rules: * If scheme or authority are not specified, Hive will use the scheme and authority from the hadoop configuration variable {{fs.default.name}} that specifies the Namenode URI. * If the path is not absolute, then Hive will interpret it relative to {{/user/<username>}} * Hive will _move_ the files addressed by _filepath_ into the table (or partition > Load Data from S3 deletes data source file > ------------------------------------------ > > Key: SPARK-40286 > URL: https://issues.apache.org/jira/browse/SPARK-40286 > Project: Spark > Issue Type: Question > Components: Documentation > Affects Versions: 3.2.1 > Reporter: Drew > Priority: Major > > Hello, > I'm using spark to [load > data|https://spark.apache.org/docs/latest/sql-ref-syntax-dml-load.html] into > a hive table through Pyspark, and when I load data from a path in Amazon S3, > the original file is getting wiped from the Directory. The file is found, and > is populating the table with data. I also tried to add the `Local` clause but > that throws an error when looking for the file. When looking through the > documentation it doesn't explicitly state that this is the intended behavior. > Thanks in advance! > {code:java} > spark.sql("CREATE TABLE src (key INT, value STRING) STORED AS textfile") > spark.sql("LOAD DATA INPATH 's3://bucket/kv1.txt' OVERWRITE INTO TABLE > src"){code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org