Re: ToLocalIterator vs collect

2017-01-05 Thread Richard Startin
Why not do that with spark sql to utilise the executors properly, rather than a sequential filter on the driver. Select * from A left join B on A.fk = B.fk where B.pk is NULL limit k If you were sorting just so you could iterate in order, this might save you a couple of sorts too.

ToLocalIterator vs collect

2017-01-05 Thread Rohit Verma
Hi all, I am aware that collect will return a list aggregated on driver, this will return OOM when we have a too big list. Is toLocalIterator safe to use with very big list, i want to access all values one by one. Basically the goal is to compare two sorted rdds (A and B) to find top k