hi, pesudo, I've posted a blog before spark-dataframe-introduction <http://litaotao.github.io/spark-dataframe-introduction?s=gmail> , and for me, I use spark dataframe [ or RDD ] to do the logic calculation on all the datasets, and then transform the result into pandas dataframe, and make data visualization using pandas dataframe, sometimes you may need matplotlib or seaborn.
-- *___________________* Quant | Engineer | Boy *___________________* *blog*: http://litaotao.github.io <http://litaotao.github.io?utm_source=spark_mail> *github*: www.github.com/litaotao