[ 
https://issues.apache.org/jira/browse/SPARK-44159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17900576#comment-17900576
 ] 

Shaojie Wu commented on SPARK-44159:
------------------------------------

any update here?

> Commands for writting (InsertIntoHadoopFsRelationCommand and 
> InsertIntoHiveTable) should log what they are doing
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-44159
>                 URL: https://issues.apache.org/jira/browse/SPARK-44159
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.4.0
>            Reporter: Navin Kumar
>            Priority: Major
>
> Improvements from SPARK-41763 decoupled the execution of create table and 
> data writing commands in a CTAS (see SPARK-41713).
> This means that while the code is cleaner with v1 write implementation 
> limited to InsertIntoHadoopFsRelationCommand and InsertIntoHiveTable, the 
> execution of these operations is less clear than it was before. Previously, 
> the command was present in the physical plan (see explain output below):
>  
> {{== Physical Plan ==}}
> {{CommandResult <empty>}}
> {{+- Execute CreateHiveTableAsSelectCommand [Database: default, TableName: 
> test_hive_text_table, InsertIntoHiveTable]}}
> {{+- *(1) Scan ExistingRDD[...]}}
> But in Spark 3.4.0, this output is:
> {{== Physical Plan ==}}
> {{CommandResult <empty>}}
> {{+- Execute CreateHiveTableAsSelectCommand}}
> {{+- CreateHiveTableAsSelectCommand [Database: default, TableName: 
> test_hive_text_table]}}
> {{+- Project [...]}}
> {{+- SubqueryAlias hive_input_table}}
> {{+- View (`hive_input_table`, [...])}}
> {{+- LogicalRDD [...], false}}
> And the write command is now missing. This makes sense since execution is 
> decoupled, but since there is no log output from InsertIntoHiveTable, there 
> is no clear way to fully know that the command actually executed. 
> I would propose that either these commands should add a log message at the 
> INFO level that indicates how many rows were written into what table to make 
> easier for a user to know what has happened from the Spark logs. Another 
> option maybe to update the explain output in Spark 3.4 to handle this, but 
> that might be more difficult and make less sense since the operations are now 
> decoupled.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to