As Paul said it really depends on what you want to do with your data, perhaps writing it to a file would be a better option, but again it depends on what you want to do with the data you collect.
Regards, Keith. http://keith-chapman.com On Tue, Apr 4, 2017 at 7:38 AM, Eike von Seggern <eike.segg...@sevenval.com> wrote: > Hi, > > depending on what you're trying to achieve `RDD.toLocalIterator()` might > help you. > > Best > > Eike > > > 2017-03-29 21:00 GMT+02:00 szep.laszlo.it <szep.laszlo...@gmail.com>: > >> Hi, >> >> after I created a dataset >> >> Dataset<Row> df = sqlContext.sql("query"); >> >> I need to have a result values and I call a method: collectAsList() >> >> List<Row> list = df.collectAsList(); >> >> But it's very slow, if I work with large datasets (20-30 million >> records). I >> know, that the result isn't presented in driver app, that's why it takes >> long time, because collectAsList() collect all data from worker nodes. >> >> But then what is the right way to get result values? Is there an other >> solution to iterate over a result dataset rows, or get values? Can anyone >> post a small & working example? >> >> Thanks & Regards, >> Laszlo Szep >> >