[ https://issues.apache.org/jira/browse/SPARK-27551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dongjoon Hyun updated SPARK-27551: ---------------------------------- Issue Type: Improvement (was: Bug) > Uniformative error message for mismatched types in when().otherwise() > --------------------------------------------------------------------- > > Key: SPARK-27551 > URL: https://issues.apache.org/jira/browse/SPARK-27551 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 2.4.0 > Reporter: Huon Wilson > Priority: Major > > When a {{when(...).otherwise(...)}} construct has a type error, the error > message can be quite uninformative, since it just splats out a potentially > large chunk of code and says the types don't match. For instance: > {code:none} > scala> spark.range(100).select(when('id === 1, array(struct('id * 123456789 + > 123456789 as "x"))).otherwise(array(struct('id * 987654321 + 987654321 as > "y")))) > org.apache.spark.sql.AnalysisException: cannot resolve 'CASE WHEN (`id` = > CAST(1 AS BIGINT)) THEN array(named_struct('x', ((`id` * CAST(123456789 AS > BIGINT)) + CAST(123456789 AS BIGINT)))) ELSE array(named_struct('y', ((`id` * > CAST(987654321 AS BIGINT)) + CAST(987654321 AS BIGINT)))) END' due to data > type mismatch: THEN and ELSE expressions should all be same type or coercible > to a common type;; > ... > {code} > The problem is the structs have different field names ({{x}} vs {{y}}), but > it's not obvious that this is the case, and this is a relatively simple case > of a single {{select}} expression. > It would be great for the error message to at least include the types that > Spark has computed, to help clarify what might have gone wrong. For instance, > {{greatest}} and {{least}} write out the expression with the types instead of > values: > {code:none} > scala> spark.range(100).select(greatest('id, struct(lit("x")))) > org.apache.spark.sql.AnalysisException: cannot resolve 'greatest(`id`, > named_struct('col1', 'x'))' due to data type mismatch: The expressions should > all have the same type, got GREATEST(bigint, struct<col1:string>).;; > {code} > For the example above, this might look like: > {code:none} > org.apache.spark.sql.AnalysisException: cannot resolve 'CASE WHEN (`id` = > CAST(1 AS BIGINT)) THEN array(named_struct('x', ((`id` * CAST(123456789 AS > BIGINT)) + CAST(123456789 AS BIGINT)))) ELSE array(named_struct('y', ((`id` * > CAST(987654321 AS BIGINT)) + CAST(987654321 AS BIGINT)))) END' due to data > type mismatch: THEN and ELSE expressions should all be same type or coercible > to a common type, got CASE WHEN ... THEN array<struct<x:bigint>> ELSE > array<struct<y:bigint>> END;; > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org