[jira] [Commented] (SPARK-40286) Load Data from S3 deletes data source file

Drew (Jira) Thu, 15 Sep 2022 13:05:05 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-40286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17605509#comment-17605509
 ]


Drew commented on SPARK-40286:
------------------------------

[https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Loadingfilesintotables]

If the keyword LOCAL is _not_ specified, then Hive will either use the full URI 
of {_}filepath{_}, if one is specified, or will apply the following rules:
 * If scheme or authority are not specified, Hive will use the scheme and 
authority from the hadoop configuration variable {{fs.default.name}} that 
specifies the Namenode URI.
 * If the path is not absolute, then Hive will interpret it relative to 
{{/user/<username>}}
 * Hive will _move_ the files addressed by _filepath_ into the table (or 
partition

> Load Data from S3 deletes data source file
> ------------------------------------------
>
>                 Key: SPARK-40286
>                 URL: https://issues.apache.org/jira/browse/SPARK-40286
>             Project: Spark
>          Issue Type: Question
>          Components: Documentation
>    Affects Versions: 3.2.1
>            Reporter: Drew
>            Priority: Major
>
> Hello, 
> I'm using spark to [load 
> data|https://spark.apache.org/docs/latest/sql-ref-syntax-dml-load.html] into 
> a hive table through Pyspark, and when I load data from a path in Amazon S3, 
> the original file is getting wiped from the Directory. The file is found, and 
> is populating the table with data. I also tried to add the `Local` clause but 
> that throws an error when looking for the file. When looking through the 
> documentation it doesn't explicitly state that this is the intended behavior.
> Thanks in advance!
> {code:java}
> spark.sql("CREATE TABLE src (key INT, value STRING) STORED AS textfile")
> spark.sql("LOAD DATA INPATH 's3://bucket/kv1.txt' OVERWRITE INTO TABLE 
> src"){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40286) Load Data from S3 deletes data source file

Reply via email to