[jira] [Updated] (SPARK-32131) union and set operations have wrong exception infomation

philipse (Jira) Mon, 29 Jun 2020 09:59:21 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-32131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


philipse updated SPARK-32131:
-----------------------------
    Description: 
Union and set operations can only be performed on tables with the compatible 
column types,while when we have more than two column, the warning messages will 
have wrong column index.Steps to reproduce.

Step1:prepare test data
{code:java}
drop table if exists test1; 
drop table if exists test2; 
drop table if exists test3;
create table if not exists test1(id int, age int, name timestamp);
create table if not exists test2(id int, age timestamp, name timestamp);
create table if not exists test3(id int, age int, name int);
insert into test1 select 1,2,'2020-01-01 01:01:01';
insert into test2 select 1,'2020-01-01 01:01:01','2020-01-01 01:01:01'; 
insert into test3 select 1,3,4;
{code}
Step2:do query:
{code:java}
Query1:
select * from test1 except select * from test2;
Result1:
Error: org.apache.spark.sql.AnalysisException: Except can only be performed on 
tables with the compatible column types. timestamp <> int at the second column 
of the second table;; 'Except false :- Project [id#620, age#621, name#622] : +- 
SubqueryAlias `default`.`test1` : +- HiveTableRelation `default`.`test1`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [id#620, age#621, name#622] 
+- Project [id#623, age#624, name#625] +- SubqueryAlias `default`.`test2` +- 
HiveTableRelation `default`.`test2`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [id#623, age#624, name#625] 
(state=,code=0)
Query2:
select * from test1 except select * from test3;
Result2:
Error: org.apache.spark.sql.AnalysisException: Except can only be performed on 
tables with the compatible column types. int <> timestamp at the 2th column of 
the second table;; 'Except false :- Project [id#632, age#633, name#634] : +- 
SubqueryAlias `default`.`test1` : +- HiveTableRelation `default`.`test1`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [id#632, age#633, name#634] 
+- Project [id#635, age#636, name#637] +- SubqueryAlias `default`.`test3` +- 
HiveTableRelation `default`.`test3`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [id#635, age#636, name#637] 
(state=,code=0)
{code}
the result of query1 is correct, while query2 have the wrong errors,it should 
be the third column

Here has the wrong column index.

+Error: org.apache.spark.sql.AnalysisException: Except can only be performed on 
tables with the compatible column types. int <> timestamp at the *2th* column 
of the second table+

We may need to change to the following

+Error: org.apache.spark.sql.AnalysisException: Except can only be performed on 
tables with the compatible column types. int <> timestamp at the *third* column 
of the second table+

  was:
Union and set operations can only be performed on tables with the compatible 
column types,while when we have more than two column, the warning messages will 
have wrong column index.Steps to reproduce.

Step1:prepare test data
{code:java}
drop table if exists test1; 
drop table if exists test2; 
drop table if exists test3;
create table if not exists test1(id int, age int, name timestamp);
create table if not exists test2(id int, age timestamp, name timestamp);
create table if not exists test3(id int, age int, name int);
insert into test1 select 1,2,'2020-01-01 01:01:01';
insert into test2 select 1,'2020-01-01 01:01:01','2020-01-01 01:01:01'; 
insert into test3 select 1,3,4;
{code}
Step2:do query:
{code:java}
Query1:
select * from test1 except select * from test2;
Result1:
Error: org.apache.spark.sql.AnalysisException: Except can only be performed on 
tables with the compatible column types. timestamp <> int at the second column 
of the second table;; 'Except false :- Project [id#620, age#621, name#622] : +- 
SubqueryAlias `default`.`test1` : +- HiveTableRelation `default`.`test1`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [id#620, age#621, name#622] 
+- Project [id#623, age#624, name#625] +- SubqueryAlias `default`.`test2` +- 
HiveTableRelation `default`.`test2`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [id#623, age#624, name#625] 
(state=,code=0)
Query2:
select * from test1 except select * from test3;
Result2:
Error: org.apache.spark.sql.AnalysisException: Except can only be performed on 
tables with the compatible column types. int <> timestamp at the 2th column of 
the second table;; 'Except false :- Project [id#632, age#633, name#634] : +- 
SubqueryAlias `default`.`test1` : +- HiveTableRelation `default`.`test1`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [id#632, age#633, name#634] 
+- Project [id#635, age#636, name#637] +- SubqueryAlias `default`.`test3` +- 
HiveTableRelation `default`.`test3`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [id#635, age#636, name#637] 
(state=,code=0)
{code}
the result of query1 is correct, while query2 have the wrong errors,it should 
be the third column


> union and set operations have wrong exception infomation
> --------------------------------------------------------
>
>                 Key: SPARK-32131
>                 URL: https://issues.apache.org/jira/browse/SPARK-32131
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: philipse
>            Priority: Minor
>
> Union and set operations can only be performed on tables with the compatible 
> column types,while when we have more than two column, the warning messages 
> will have wrong column index.Steps to reproduce.
> Step1:prepare test data
> {code:java}
> drop table if exists test1; 
> drop table if exists test2; 
> drop table if exists test3;
> create table if not exists test1(id int, age int, name timestamp);
> create table if not exists test2(id int, age timestamp, name timestamp);
> create table if not exists test3(id int, age int, name int);
> insert into test1 select 1,2,'2020-01-01 01:01:01';
> insert into test2 select 1,'2020-01-01 01:01:01','2020-01-01 01:01:01'; 
> insert into test3 select 1,3,4;
> {code}
> Step2:do query:
> {code:java}
> Query1:
> select * from test1 except select * from test2;
> Result1:
> Error: org.apache.spark.sql.AnalysisException: Except can only be performed 
> on tables with the compatible column types. timestamp <> int at the second 
> column of the second table;; 'Except false :- Project [id#620, age#621, 
> name#622] : +- SubqueryAlias `default`.`test1` : +- HiveTableRelation 
> `default`.`test1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
> [id#620, age#621, name#622] +- Project [id#623, age#624, name#625] +- 
> SubqueryAlias `default`.`test2` +- HiveTableRelation `default`.`test2`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [id#623, age#624, 
> name#625] (state=,code=0)
> Query2:
> select * from test1 except select * from test3;
> Result2:
> Error: org.apache.spark.sql.AnalysisException: Except can only be performed 
> on tables with the compatible column types. int <> timestamp at the 2th 
> column of the second table;; 'Except false :- Project [id#632, age#633, 
> name#634] : +- SubqueryAlias `default`.`test1` : +- HiveTableRelation 
> `default`.`test1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
> [id#632, age#633, name#634] +- Project [id#635, age#636, name#637] +- 
> SubqueryAlias `default`.`test3` +- HiveTableRelation `default`.`test3`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [id#635, age#636, 
> name#637] (state=,code=0)
> {code}
> the result of query1 is correct, while query2 have the wrong errors,it should 
> be the third column
> Here has the wrong column index.
> +Error: org.apache.spark.sql.AnalysisException: Except can only be performed 
> on tables with the compatible column types. int <> timestamp at the *2th* 
> column of the second table+
> We may need to change to the following
> +Error: org.apache.spark.sql.AnalysisException: Except can only be performed 
> on tables with the compatible column types. int <> timestamp at the *third* 
> column of the second table+



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-32131) union and set operations have wrong exception infomation

Reply via email to