Hi, An interesting question that I must admit I'm not sure how to answer myself actually :)
Off the top of my head, I'd **guess** unless you cache the first query these two queries would share nothing. With caching, there's a phase in query execution when a canonicalized version of a query is used to look up any cached queries. Again, I'm not really sure and if I'd have to answer it (e.g. as part of an interview) I'd say nothing would be shared / re-used. Pozdrawiam, Jacek Laskowski ---- https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https://twitter.com/jaceklaskowski> On Wed, Jan 13, 2021 at 5:39 PM Koert Kuipers <ko...@tresata.com> wrote: > is shuffle file re-use based on identity or equality of the dataframe? > > for example if run the exact same code twice to load data and do > transforms (joins, aggregations, etc.) but without re-using any actual > dataframes, will i still see skipped stages thanks to shuffle file re-use? > > thanks! > koert >