[jira] [Commented] (SPARK-13699) Spark SQL drops the table in "overwrite" mode while writing into table

Suresh Thalamati (JIRA) Mon, 07 Mar 2016 15:24:07 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-13699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15183996#comment-15183996
 ]


Suresh Thalamati commented on SPARK-13699:
------------------------------------------

Thank you for providing  the reproduction to the problem I was able to 
reproduce the issue.  Problem is you are trying to overwrite a table that is 
also being read in the data frame. This is not allowed , it should fail with an 
error  (I noticed in some cases I get an error 
org.apache.spark.sql.AnalysisException: Cannot overwrite table `t1` that is 
also being read from).    I think  this usage should  raise an error. 

Truncate is any interesting option ,  especially with jdbc data source.  But 
that will not address the problem you are running into, it will run into same 
problem as Overwrite.    
 

{code}

scala> tgtFinal.explain
== Physical Plan ==
Union
:- WholeStageCodegen
:  :  +- Project 
[col1#223,col2#224,col3#225,col4#226,batchid#227,currind#228,startdate#229,cast(enddate#230
 as string) AS enddate#263,updatedate#231]
:  :     +- Filter (currind#228 = N)
:  :        +- INPUT
:  +- HiveTableScan 
[enddate#230,updatedate#231,col2#224,col1#223,batchid#227,col3#225,startdate#229,currind#228,col4#226],
 MetastoreRelation default, tgt_table, None
:- WholeStageCodegen
:  :  +- Project 
[col1#223,col2#224,col3#225,col4#226,batchid#227,currind#228,startdate#229,cast(enddate#230
 as string) AS enddate#264,updatedate#231]
:  :     +- INPUT
:  +- Except
:     :- WholeStageCodegen
:     :  :  +- Filter (currind#228 = Y)
:     :  :     +- INPUT
:     :  +- HiveTableScan 
[col1#223,col2#224,col3#225,col4#226,batchid#227,currind#228,startdate#229,enddate#230,updatedate#231],
 MetastoreRelation default, tgt_table, None
:     +- WholeStageCodegen
:        :  +- Project 
[col1#223,col2#224,col3#225,col4#226,batchid#227,currind#228,startdate#229,enddate#230,updatedate#231]
:        :     +- BroadcastHashJoin [cast(col1#223 as double)], [cast(col1#219 
as double)], Inner, BuildRight, None
:        :        :- Filter (currind#228 = Y)
:        :        :  +- INPUT
:        :        +- INPUT
:        :- HiveTableScan 
[col1#223,col2#224,col3#225,col4#226,batchid#227,currind#228,startdate#229,enddate#230,updatedate#231],
 MetastoreRelation default, tgt_table, None
:        +- HiveTableScan [col1#219], MetastoreRelation default, src_table, None
:- WholeStageCodegen
:  :  +- Project [col1#223,col2#224,col3#225,col4#226,batchid#227,UDF(col1#223) 
AS currInd#232,startdate#229,2016-03-07 15:12:20.584 AS 
endDate#265,1457392340584000 AS updateDate#234]
:  :     +- BroadcastHashJoin [cast(col1#223 as double)], [cast(col1#219 as 
double)], Inner, BuildRight, None
:  :        :- Project 
[col3#225,startdate#229,col2#224,col1#223,batchid#227,col4#226]
:  :        :  +- Filter (currind#228 = Y)
:  :        :     +- INPUT
:  :        +- INPUT
:  :- HiveTableScan 
[col3#225,startdate#229,col2#224,col1#223,batchid#227,col4#226,currind#228], 
MetastoreRelation default, tgt_table, None
:  +- HiveTableScan [col1#219], MetastoreRelation default, src_table, None
+- WholeStageCodegen
   :  +- Project [cast(col1#219 as string) AS 
col1#266,col2#220,col3#221,col4#222,UDF(cast(col1#219 as string)) AS 
batchId#235,UDF(cast(col1#219 as string)) AS currInd#236,1457392340584000 AS 
startDate#237,date_format(cast(UDF(cast(col1#219 as string)) as 
timestamp),yyyy-MM-dd HH:mm:ss) AS endDate#238,1457392340584000 AS 
updateDate#239]
   :     +- INPUT
   +- HiveTableScan [col1#219,col2#220,col3#221,col4#222], MetastoreRelation 
default, src_table, None

scala> 
{code}

> Spark SQL drops the table in "overwrite" mode while writing into table
> ----------------------------------------------------------------------
>
>                 Key: SPARK-13699
>                 URL: https://issues.apache.org/jira/browse/SPARK-13699
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 1.6.0
>            Reporter: Dhaval Modi
>         Attachments: stackTrace.txt
>
>
> Hi,
> While writing the dataframe to HIVE table with "SaveMode.Overwrite" option.
> E.g.
> tgtFinal.write.mode(SaveMode.Overwrite).saveAsTable("tgt_table")
> sqlContext drop the table instead of truncating.
> This is causing error while overwriting.
> Adding stacktrace & commands to reproduce the issue,
> Thanks & Regards,
> Dhaval



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-13699) Spark SQL drops the table in "overwrite" mode while writing into table

Reply via email to