I don’t know your exact underlying business problem, but maybe a graph solution, such as Spark Graphx meets better your requirements. Usually self-joins are done to address some kind of graph problem (even if you would not describe it as such) and is for these kind of problems much more efficient.
> Am 11.12.2018 um 12:44 schrieb Marco Gaido <marcogaid...@gmail.com>: > > Hi all, > > I'd like to bring to the attention of a more people a problem which has been > there for long, ie, self joins. Currently, we have many troubles with them. > This has been reported several times to the community and seems to affect > many people, but as of now no solution has been accepted for it. > > I created a PR some time ago in order to address the problem > (https://github.com/apache/spark/pull/21449), but Wenchen mentioned he tried > to fix this problem too but so far no attempt was successful because there is > no clear semantic > (https://github.com/apache/spark/pull/21449#issuecomment-393554552). > > So I'd like to propose to discuss here which is the best approach for > tackling this issue, which I think would be great to fix for 3.0.0, so if we > decide to introduce breaking changes in the design, we can do that. > > Thoughts on this? > > Thanks, > Marco