here is the information missed 1. Spark 3.2.0 2. it is scala based 3. size of tables will be ~60G 4. explain plan for catalysts shows lots of time is being spent in creating the plan 5. number of union table is 2 , and another 2 then finally 2
slowness is providing resylut as the data size & column size increases . On Wed, Feb 22, 2023 at 11:07 AM Enrico Minack <i...@enrico.minack.dev> wrote: > Plus number of unioned tables would be helpful, as well as which > downstream operations are performed on the unioned tables. > > And what "performance issues" do you exactly measure? > > Enrico > > > > Am 22.02.23 um 16:50 schrieb Mich Talebzadeh: > > Hi, > > Few details will help > > 1. Spark version > 2. Spark SQL, Scala or PySpark > 3. size of tables in join. > 4. What does explain() or the joining operation show? > > > HTH > > > view my Linkedin profile > <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> > > > https://en.everybodywiki.com/Mich_Talebzadeh > > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > > On Wed, 22 Feb 2023 at 15:42, Prem Sahoo <prem.re...@gmail.com> wrote: > >> Hello Team, >> We are observing Spark Union performance issues when unioning big tables >> with lots of rows. Do we have any option apart from the Union ? >> > >