[
https://issues.apache.org/jira/browse/SPARK-44159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17900576#comment-17900576
]
Shaojie Wu commented on SPARK-44159:
------------------------------------
any update here?
> Commands for writting (InsertIntoHadoopFsRelationCommand and
> InsertIntoHiveTable) should log what they are doing
> ----------------------------------------------------------------------------------------------------------------
>
> Key: SPARK-44159
> URL: https://issues.apache.org/jira/browse/SPARK-44159
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 3.4.0
> Reporter: Navin Kumar
> Priority: Major
>
> Improvements from SPARK-41763 decoupled the execution of create table and
> data writing commands in a CTAS (see SPARK-41713).
> This means that while the code is cleaner with v1 write implementation
> limited to InsertIntoHadoopFsRelationCommand and InsertIntoHiveTable, the
> execution of these operations is less clear than it was before. Previously,
> the command was present in the physical plan (see explain output below):
>
> {{== Physical Plan ==}}
> {{CommandResult <empty>}}
> {{+- Execute CreateHiveTableAsSelectCommand [Database: default, TableName:
> test_hive_text_table, InsertIntoHiveTable]}}
> {{+- *(1) Scan ExistingRDD[...]}}
> But in Spark 3.4.0, this output is:
> {{== Physical Plan ==}}
> {{CommandResult <empty>}}
> {{+- Execute CreateHiveTableAsSelectCommand}}
> {{+- CreateHiveTableAsSelectCommand [Database: default, TableName:
> test_hive_text_table]}}
> {{+- Project [...]}}
> {{+- SubqueryAlias hive_input_table}}
> {{+- View (`hive_input_table`, [...])}}
> {{+- LogicalRDD [...], false}}
> And the write command is now missing. This makes sense since execution is
> decoupled, but since there is no log output from InsertIntoHiveTable, there
> is no clear way to fully know that the command actually executed.
> I would propose that either these commands should add a log message at the
> INFO level that indicates how many rows were written into what table to make
> easier for a user to know what has happened from the Spark logs. Another
> option maybe to update the explain output in Spark 3.4 to handle this, but
> that might be more difficult and make less sense since the operations are now
> decoupled.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]