[jira] [Commented] (SPARK-40286) Load Data from S3 deletes data source file

Sean R. Owen (Jira) Wed, 31 Aug 2022 10:50:09 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-40286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17598542#comment-17598542
 ]


Sean R. Owen commented on SPARK-40286:
--------------------------------------

Where is src stored? LOAD DATA should not affect the source, but, you are 
OVERWRITEing whatever is in src's storage.


> Load Data from S3 deletes data source file
> ------------------------------------------
>
>                 Key: SPARK-40286
>                 URL: https://issues.apache.org/jira/browse/SPARK-40286
>             Project: Spark
>          Issue Type: Question
>          Components: Documentation
>    Affects Versions: 3.2.1
>            Reporter: Drew
>            Priority: Major
>
> Hello, 
> I'm using spark to [load 
> data|https://spark.apache.org/docs/latest/sql-ref-syntax-dml-load.html] into 
> a hive table through Pyspark, and when I load data from a path in Amazon S3, 
> the original file is getting wiped from the Directory. The file is found, and 
> is populating the table with data. I also tried to add the `Local` clause but 
> that throws an error when looking for the file. When looking through the 
> documentation it doesn't explicitly state that this is the intended behavior.
> Thanks in advance!
> {code:java}
> spark.sql("CREATE TABLE src (key INT, value STRING) STORED AS textfile")
> spark.sql("LOAD DATA INPATH 's3://bucket/kv1.txt' OVERWRITE INTO TABLE 
> src"){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40286) Load Data from S3 deletes data source file

Reply via email to