[jira] [Commented] (SPARK-40286) Load Data from S3 deletes data source file

Sean R. Owen (Jira) Wed, 31 Aug 2022 12:09:07 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-40286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17598574#comment-17598574
 ]


Sean R. Owen commented on SPARK-40286:
--------------------------------------

I could be completely wrong, but then I'd be quite as surprised as you are, if 
that's how this is meant to work. If so it needs to be in the docs

> Load Data from S3 deletes data source file
> ------------------------------------------
>
>                 Key: SPARK-40286
>                 URL: https://issues.apache.org/jira/browse/SPARK-40286
>             Project: Spark
>          Issue Type: Question
>          Components: Documentation
>    Affects Versions: 3.2.1
>            Reporter: Drew
>            Priority: Major
>
> Hello, 
> I'm using spark to [load 
> data|https://spark.apache.org/docs/latest/sql-ref-syntax-dml-load.html] into 
> a hive table through Pyspark, and when I load data from a path in Amazon S3, 
> the original file is getting wiped from the Directory. The file is found, and 
> is populating the table with data. I also tried to add the `Local` clause but 
> that throws an error when looking for the file. When looking through the 
> documentation it doesn't explicitly state that this is the intended behavior.
> Thanks in advance!
> {code:java}
> spark.sql("CREATE TABLE src (key INT, value STRING) STORED AS textfile")
> spark.sql("LOAD DATA INPATH 's3://bucket/kv1.txt' OVERWRITE INTO TABLE 
> src"){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40286) Load Data from S3 deletes data source file

Reply via email to