[ 
https://issues.apache.org/jira/browse/SPARK-43438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17736485#comment-17736485
 ] 

BingKun Pan edited comment on SPARK-43438 at 6/23/23 12:44 PM:
---------------------------------------------------------------

I checked and found that after `[https://github.com/apache/spark/pull/41458]`,

1.when execute sql "INSERT INTO tabtest SELECT 1", will execute successfully.

There is a default value completion operation.

[https://github.com/apache/spark/blob/cd69d4dd18cfaccf58bf64dde6268f7ea1d4415b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala#L393-L397]

[https://github.com/apache/spark/blob/cd69d4dd18cfaccf58bf64dde6268f7ea1d4415b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala#L401]

[https://github.com/apache/spark/blob/cd69d4dd18cfaccf58bf64dde6268f7ea1d4415b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala#L42]

 

2.when execute sql "INSERT INTO tabtest(c1, c2) SELECT 1", the error is as 
follows:

[INSERT_COLUMN_ARITY_MISMATCH.NOT_ENOUGH_DATA_COLUMNS] Cannot write to 
`spark_catalog`.`default`.`tabtest`, the reason is not enough data columns:
Table columns: `c1`, `c2`.
Data columns: `1`.

 

3.when execute sql "INSERT INTO tabtest(c1, c2) SELECT 1", the error is as 
follows:

[INSERT_COLUMN_ARITY_MISMATCH.TOO_MANY_DATA_COLUMNS] Cannot write to 
`spark_catalog`.`default`.`tabtest`, the reason is too many data columns:
Table columns: `c1`.
Data columns: `1`, `2`, `3`.

 

Among them, 2 and 3 are in line with our expectations.

But the behavior difference between 1 and 2 is a bit confusing.

 

*Should we align the logic of 1 and 2?*


was (Author: panbingkun):
I checked and found that after `[https://github.com/apache/spark/pull/41458]`,

1.when execute sql "INSERT INTO tabtest SELECT 1", will execute successfully.

[https://github.com/apache/spark/blob/cd69d4dd18cfaccf58bf64dde6268f7ea1d4415b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala#L393-L397]

[https://github.com/apache/spark/blob/cd69d4dd18cfaccf58bf64dde6268f7ea1d4415b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala#L401]

[https://github.com/apache/spark/blob/cd69d4dd18cfaccf58bf64dde6268f7ea1d4415b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala#L42]

 

2.when execute sql "INSERT INTO tabtest(c1, c2) SELECT 1", the error is as 
follows:

[INSERT_COLUMN_ARITY_MISMATCH.NOT_ENOUGH_DATA_COLUMNS] Cannot write to 
`spark_catalog`.`default`.`tabtest`, the reason is not enough data columns:
Table columns: `c1`, `c2`.
Data columns: `1`.

 

3.when execute sql "INSERT INTO tabtest(c1, c2) SELECT 1", the error is as 
follows:

[INSERT_COLUMN_ARITY_MISMATCH.TOO_MANY_DATA_COLUMNS] Cannot write to 
`spark_catalog`.`default`.`tabtest`, the reason is too many data columns:
Table columns: `c1`.
Data columns: `1`, `2`, `3`.

 

Among them, 2 and 3 are in line with our expectations.

But the behavior difference between 1 and 2 is a bit confusing.

 

*Should we align the logic of 1 and 2?*

> Fix mismatched column list error on INSERT
> ------------------------------------------
>
>                 Key: SPARK-43438
>                 URL: https://issues.apache.org/jira/browse/SPARK-43438
>             Project: Spark
>          Issue Type: Sub-task
>          Components: Spark Core
>    Affects Versions: 3.4.0
>            Reporter: Serge Rielau
>            Priority: Major
>
> This error message is pretty bad, and common
> "_LEGACY_ERROR_TEMP_1038" : {
> "message" : [
> "Cannot write to table due to mismatched user specified column 
> size(<columnSize>) and data column size(<outputSize>)."
> ]
> },
> It can perhaps be merged with this one - after giving it an ERROR_CLASS
> "_LEGACY_ERROR_TEMP_1168" : {
> "message" : [
> "<tableName> requires that the data to be inserted have the same number of 
> columns as the target table: target table has <targetColumns> column(s) but 
> the inserted data has <insertedColumns> column(s), including <staticPartCols> 
> partition column(s) having constant value(s)."
> ]
> },
> Repro:
> CREATE TABLE tabtest(c1 INT, c2 INT);
> INSERT INTO tabtest SELECT 1;
> `spark_catalog`.`default`.`tabtest` requires that the data to be inserted 
> have the same number of columns as the target table: target table has 2 
> column(s) but the inserted data has 1 column(s), including 0 partition 
> column(s) having constant value(s).
> INSERT INTO tabtest(c1) SELECT 1, 2, 3;
> Cannot write to table due to mismatched user specified column size(1) and 
> data column size(3).; line 1 pos 24
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to