Thanks Silvio!
In the meantime, with help of Adam and code review of WholeStageCodegenExec
and CollapseCodegenStages, I found out that anything that's codegend is as
fast as the tasks in a stage. In this case, union of two codegend subtrees
is indeed parallel.
Pozdrawiam,
Jacek Laskowski
Hi Jacek,
Just replied to the SO thread as well, but…
Yes, your first statement is correct. The DFs in the union are read in the same
stage, so in your example where each DF has 8 partitions then you have a stage
with 16 tasks to read the 2 DFs. There's no need to define the DF in a separate
Hi,
I've been trying to find out the answer to the question about UNION ALL and
SELECTs @ https://stackoverflow.com/q/47837955/1305344
> If I have Spark SQL statement of the form SELECT [...] UNION ALL SELECT
[...], will the two SELECT statements be executed in parallel? In my
specific use case