I don’t know your exact underlying business problem,  but maybe a graph 
solution, such as Spark Graphx meets better your requirements. Usually 
self-joins are done to address some kind of graph problem (even if you would 
not describe it as such) and is for these kind of problems much more efficient. 

> Am 11.12.2018 um 12:44 schrieb Marco Gaido <marcogaid...@gmail.com>:
> 
> Hi all,
> 
> I'd like to bring to the attention of a more people a problem which has been 
> there for long, ie, self joins. Currently, we have many troubles with them. 
> This has been reported several times to the community and seems to affect 
> many people, but as of now no solution has been accepted for it.
> 
> I created a PR some time ago in order to address the problem 
> (https://github.com/apache/spark/pull/21449), but Wenchen mentioned he tried 
> to fix this problem too but so far no attempt was successful because there is 
> no clear semantic 
> (https://github.com/apache/spark/pull/21449#issuecomment-393554552).
> 
> So I'd like to propose to discuss here which is the best approach for 
> tackling this issue, which I think would be great to fix for 3.0.0, so if we 
> decide to introduce breaking changes in the design, we can do that.
> 
> Thoughts on this?
> 
> Thanks,
> Marco

Reply via email to