Hi all,
Just wanted to thank all for the dataset API - most of the times we see
only bugs in these lists ;o).
- Putting some context, this weekend I was updating the SQL chapters of
my book - it had all the ugliness of SchemaRDD,
registerTempTable, take(10).foreach(println)
and take(30).foreach(e=>println("%15s | %9.2f |".format(e(0),e(1)))) ;o)
- I remember Hossein Falaki chiding me about the ugly println statements
!
- Took me a little while to grok the dataset, sparksession,
spark.read.option("header","true").option("inferSchema","true").csv(...)
et
al.
- I am a big R fan and know the language pretty decent - so the
constructs are familiar
- Once I got it ( I am sure still there are more mysteries to uncover
...) it was just beautiful - well done folks !!!
- One sees the contrast a lot better while teaching or writing books,
because one has to think thru the old, the new and the transitional arc
- I even remember the good old days when we were discussing whether
Spark would get the dataframes like R at one of Paco's sessions !
- And now, it looks very decent for data wrangling.
Cheers & keep up the good work
<k/>
P.S: My next chapter is the MLlib - need to convert to ml. Should be
interesting ... I am a glutton for punishment - of the Spark kind, of
course !