Navin Kumar created SPARK-44159:
-----------------------------------

             Summary: Commands for writting (InsertIntoHadoopFsRelationCommand 
and InsertIntoHiveTable) should log what they are doing
                 Key: SPARK-44159
                 URL: https://issues.apache.org/jira/browse/SPARK-44159
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 3.4.0
            Reporter: Navin Kumar


Improvements from SPARK-41763 decoupled the execution of create table and data 
writing commands in a CTAS (see SPARK-41713).

This means that while the code is cleaner with v1 write implementation limited 
to InsertIntoHadoopFsRelationCommand and InsertIntoHiveTable, the execution of 
these operations is less clear than it was before. Previously, the command was 
present in the physical plan (see explain output below):
 
{{== Physical Plan ==}}
{{CommandResult <empty>}}
{{+- Execute CreateHiveTableAsSelectCommand [Database: default, TableName: 
test_hive_text_table, InsertIntoHiveTable]}}
{{+- *(1) Scan ExistingRDD[...]}}

But in Spark 3.4.0, this output is:

{{== Physical Plan ==}}
{{CommandResult <empty>}}
{{+- Execute CreateHiveTableAsSelectCommand}}
{{+- CreateHiveTableAsSelectCommand [Database: default, TableName: 
test_hive_text_table]}}
{{+- Project [...]}}
{{+- SubqueryAlias hive_input_table}}
{{+- View (`hive_input_table`, [...])}}
{{+- LogicalRDD [...], false}}

And the write command is now missing. This makes sense since execution is 
decoupled, but since there is no log output from InsertIntoHiveTable, there is 
no clear way to fully know that the command actually executed. 

I would propose that either these commands should add a log message at the INFO 
level that indicates how many rows were written into what table to make easier 
for a user to know what has happened from the Spark logs. Another option maybe 
to update the explain output in Spark 3.4 to handle this, but that might be 
more difficult and make less sense since the operations are now decoupled.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to