Re: mysterious spark.sql.utils.AnalysisException Union in spark 3.3.2, but not seen in 3.4.0+

2023-08-25 Thread Mich Talebzadeh
Hi Srivastan,

Ground investigation

   1. Does this union explicitly exist in your code? If not, where are the
   7 and 6 column counting coming from?
   2. On 3.3.1 have you looked at spark UI and the relevant dag diagram
   3. Check query execution plan using explain() functionality
   4. Can you reproduce this error on 3.3.2 using a smaller sample of data
   and simplified query.
   5. Check  Spark 3.3.2 and 3.4 release notes for any relevant changes and
   bug fixes relevant to this case
   6. Have you reported this issue to the EMR user group?


HTH

Mich Talebzadeh,
Distinguished Technologist, Solutions Architect & Engineer
London
United Kingdom


   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Fri, 25 Aug 2023 at 16:01, Srivatsan vn  wrote:

> Hello Users,
>
>I have been seeing some weird issues when I upgraded my
> EMR setup to 6.11 (which uses spark 3.3.2) , the call stack seems to point
> to a code location where there is no explicit union, also I have
> unionByName everywhere in the codebase with allowMissingColumns set to
> True. I suspect the reported union in the exception is probably inserted in
> the plan by spark optimizer?
>
> spark.sql.utils.AnalysisException: Union can only be performed on tables
> with the same number of columns, but the first table has 7 columns and the
> second table has 6 columns
>
> The issue seems to have disappeared when I did a quick test with spark
> 3.4.0 in my local setup, I am just curious if this is a known issue in
> spark user/dev community or if I am missing something.
>
>
> Thanks
>
> Srivatsan
>


mysterious spark.sql.utils.AnalysisException Union in spark 3.3.2, but not seen in 3.4.0+

2023-08-25 Thread Srivatsan vn
Hello Users,

   I have been seeing some weird issues when I upgraded my
EMR setup to 6.11 (which uses spark 3.3.2) , the call stack seems to point
to a code location where there is no explicit union, also I have
unionByName everywhere in the codebase with allowMissingColumns set to
True. I suspect the reported union in the exception is probably inserted in
the plan by spark optimizer?

spark.sql.utils.AnalysisException: Union can only be performed on tables
with the same number of columns, but the first table has 7 columns and the
second table has 6 columns

The issue seems to have disappeared when I did a quick test with spark
3.4.0 in my local setup, I am just curious if this is a known issue in
spark user/dev community or if I am missing something.


Thanks

Srivatsan