Hi all, Just wanted to thank all for the dataset API - most of the times we see only bugs in these lists ;o).
- Putting some context, this weekend I was updating the SQL chapters of my book - it had all the ugliness of SchemaRDD, registerTempTable, take(10).foreach(println) and take(30).foreach(e=>println("%15s | %9.2f |".format(e(0),e(1)))) ;o) - I remember Hossein Falaki chiding me about the ugly println statements ! - Took me a little while to grok the dataset, sparksession, spark.read.option("header","true").option("inferSchema","true").csv(...) et al. - I am a big R fan and know the language pretty decent - so the constructs are familiar - Once I got it ( I am sure still there are more mysteries to uncover ...) it was just beautiful - well done folks !!! - One sees the contrast a lot better while teaching or writing books, because one has to think thru the old, the new and the transitional arc - I even remember the good old days when we were discussing whether Spark would get the dataframes like R at one of Paco's sessions ! - And now, it looks very decent for data wrangling. Cheers & keep up the good work <k/> P.S: My next chapter is the MLlib - need to convert to ml. Should be interesting ... I am a glutton for punishment - of the Spark kind, of course !