[ https://issues.apache.org/jira/browse/SPARK-43438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17736485#comment-17736485 ]
BingKun Pan edited comment on SPARK-43438 at 6/23/23 12:44 PM: --------------------------------------------------------------- I checked and found that after `[https://github.com/apache/spark/pull/41458]`, 1.when execute sql "INSERT INTO tabtest SELECT 1", will execute successfully. There is a default value completion operation. [https://github.com/apache/spark/blob/cd69d4dd18cfaccf58bf64dde6268f7ea1d4415b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala#L393-L397] [https://github.com/apache/spark/blob/cd69d4dd18cfaccf58bf64dde6268f7ea1d4415b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala#L401] [https://github.com/apache/spark/blob/cd69d4dd18cfaccf58bf64dde6268f7ea1d4415b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala#L42] 2.when execute sql "INSERT INTO tabtest(c1, c2) SELECT 1", the error is as follows: [INSERT_COLUMN_ARITY_MISMATCH.NOT_ENOUGH_DATA_COLUMNS] Cannot write to `spark_catalog`.`default`.`tabtest`, the reason is not enough data columns: Table columns: `c1`, `c2`. Data columns: `1`. 3.when execute sql "INSERT INTO tabtest(c1, c2) SELECT 1", the error is as follows: [INSERT_COLUMN_ARITY_MISMATCH.TOO_MANY_DATA_COLUMNS] Cannot write to `spark_catalog`.`default`.`tabtest`, the reason is too many data columns: Table columns: `c1`. Data columns: `1`, `2`, `3`. Among them, 2 and 3 are in line with our expectations. But the behavior difference between 1 and 2 is a bit confusing. *Should we align the logic of 1 and 2?* was (Author: panbingkun): I checked and found that after `[https://github.com/apache/spark/pull/41458]`, 1.when execute sql "INSERT INTO tabtest SELECT 1", will execute successfully. [https://github.com/apache/spark/blob/cd69d4dd18cfaccf58bf64dde6268f7ea1d4415b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala#L393-L397] [https://github.com/apache/spark/blob/cd69d4dd18cfaccf58bf64dde6268f7ea1d4415b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala#L401] [https://github.com/apache/spark/blob/cd69d4dd18cfaccf58bf64dde6268f7ea1d4415b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala#L42] 2.when execute sql "INSERT INTO tabtest(c1, c2) SELECT 1", the error is as follows: [INSERT_COLUMN_ARITY_MISMATCH.NOT_ENOUGH_DATA_COLUMNS] Cannot write to `spark_catalog`.`default`.`tabtest`, the reason is not enough data columns: Table columns: `c1`, `c2`. Data columns: `1`. 3.when execute sql "INSERT INTO tabtest(c1, c2) SELECT 1", the error is as follows: [INSERT_COLUMN_ARITY_MISMATCH.TOO_MANY_DATA_COLUMNS] Cannot write to `spark_catalog`.`default`.`tabtest`, the reason is too many data columns: Table columns: `c1`. Data columns: `1`, `2`, `3`. Among them, 2 and 3 are in line with our expectations. But the behavior difference between 1 and 2 is a bit confusing. *Should we align the logic of 1 and 2?* > Fix mismatched column list error on INSERT > ------------------------------------------ > > Key: SPARK-43438 > URL: https://issues.apache.org/jira/browse/SPARK-43438 > Project: Spark > Issue Type: Sub-task > Components: Spark Core > Affects Versions: 3.4.0 > Reporter: Serge Rielau > Priority: Major > > This error message is pretty bad, and common > "_LEGACY_ERROR_TEMP_1038" : { > "message" : [ > "Cannot write to table due to mismatched user specified column > size(<columnSize>) and data column size(<outputSize>)." > ] > }, > It can perhaps be merged with this one - after giving it an ERROR_CLASS > "_LEGACY_ERROR_TEMP_1168" : { > "message" : [ > "<tableName> requires that the data to be inserted have the same number of > columns as the target table: target table has <targetColumns> column(s) but > the inserted data has <insertedColumns> column(s), including <staticPartCols> > partition column(s) having constant value(s)." > ] > }, > Repro: > CREATE TABLE tabtest(c1 INT, c2 INT); > INSERT INTO tabtest SELECT 1; > `spark_catalog`.`default`.`tabtest` requires that the data to be inserted > have the same number of columns as the target table: target table has 2 > column(s) but the inserted data has 1 column(s), including 0 partition > column(s) having constant value(s). > INSERT INTO tabtest(c1) SELECT 1, 2, 3; > Cannot write to table due to mismatched user specified column size(1) and > data column size(3).; line 1 pos 24 > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org