Re: How to...UNION ALL of two SELECTs over different data sources in parallel?

2017-12-17 Thread Jacek Laskowski
Thanks Silvio! In the meantime, with help of Adam and code review of WholeStageCodegenExec and CollapseCodegenStages, I found out that anything that's codegend is as fast as the tasks in a stage. In this case, union of two codegend subtrees is indeed parallel. Pozdrawiam, Jacek Laskowski

Re: How to...UNION ALL of two SELECTs over different data sources in parallel?

2017-12-16 Thread Silvio Fiorito
Hi Jacek, Just replied to the SO thread as well, but… Yes, your first statement is correct. The DFs in the union are read in the same stage, so in your example where each DF has 8 partitions then you have a stage with 16 tasks to read the 2 DFs. There's no need to define the DF in a separate

How to...UNION ALL of two SELECTs over different data sources in parallel?

2017-12-16 Thread Jacek Laskowski
Hi, I've been trying to find out the answer to the question about UNION ALL and SELECTs @ https://stackoverflow.com/q/47837955/1305344 > If I have Spark SQL statement of the form SELECT [...] UNION ALL SELECT [...], will the two SELECT statements be executed in parallel? In my specific use case