RE: generateTreeString causes huge performance problems on dataframe persistence

2015-06-17 Thread Cheng, Hao
Seems you're hitting the self-join, currently Spark SQL won't cache any result/logical tree for further analyzing or computing for self-join. Since the logical tree is huge, it's reasonable to take long time in generating its tree string recursively. And I also doubt the computing can finish

Re: generateTreeString causes huge performance problems on dataframe persistence

2015-06-17 Thread Jan-Paul Bultmann
Seems you're hitting the self-join, currently Spark SQL won't cache any result/logical tree for further analyzing or computing for self-join. Other joins don’t suffer from this problem? Since the logical tree is huge, it's reasonable to take long time in generating its tree string