Navin Kumar created SPARK-44159: ----------------------------------- Summary: Commands for writting (InsertIntoHadoopFsRelationCommand and InsertIntoHiveTable) should log what they are doing Key: SPARK-44159 URL: https://issues.apache.org/jira/browse/SPARK-44159 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.4.0 Reporter: Navin Kumar
Improvements from SPARK-41763 decoupled the execution of create table and data writing commands in a CTAS (see SPARK-41713). This means that while the code is cleaner with v1 write implementation limited to InsertIntoHadoopFsRelationCommand and InsertIntoHiveTable, the execution of these operations is less clear than it was before. Previously, the command was present in the physical plan (see explain output below): {{== Physical Plan ==}} {{CommandResult <empty>}} {{+- Execute CreateHiveTableAsSelectCommand [Database: default, TableName: test_hive_text_table, InsertIntoHiveTable]}} {{+- *(1) Scan ExistingRDD[...]}} But in Spark 3.4.0, this output is: {{== Physical Plan ==}} {{CommandResult <empty>}} {{+- Execute CreateHiveTableAsSelectCommand}} {{+- CreateHiveTableAsSelectCommand [Database: default, TableName: test_hive_text_table]}} {{+- Project [...]}} {{+- SubqueryAlias hive_input_table}} {{+- View (`hive_input_table`, [...])}} {{+- LogicalRDD [...], false}} And the write command is now missing. This makes sense since execution is decoupled, but since there is no log output from InsertIntoHiveTable, there is no clear way to fully know that the command actually executed. I would propose that either these commands should add a log message at the INFO level that indicates how many rows were written into what table to make easier for a user to know what has happened from the Spark logs. Another option maybe to update the explain output in Spark 3.4 to handle this, but that might be more difficult and make less sense since the operations are now decoupled. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org