pyspark read json file with high dimensional sparse data

2016-03-30 Thread Yavuz Nuzumlalı
Hi all, I'm trying to read a data inside a json file using `SQLContext.read.json()` method. However, reading operation does not finish. My data is of 29x3100 dimensions, but it's actually really sparse, so if there is a way to directly read json into a sparse dataframe, it would work perfect

Re: Plot DataFrame with matplotlib

2016-03-30 Thread Yavuz Nuzumlalı
ot3d > ( > http://matplotlib.org/mpl_toolkits/mplot3d/tutorial.html#mpl_toolkits.mplot3d.Axes3D.scatter > ), > it only supports "array-like" input data. > > so yes, to use matplotlib, you need to take the elements out of RDD, > and send them to plot API as list object.

Re: Plot DataFrame with matplotlib

2016-03-23 Thread Yavuz Nuzumlalı
eng...@gmail.com> wrote: > not sure about 3d plot, but there is a nice example: > > https://github.com/zalando/spark-appliance/blob/master/examples/notebooks/PySpark_sklearn_matplotlib.ipynb > > for plotting rdd or dataframe using matplotlib. > > Am Mittwoch, 23. März 2016 schrieb Y

Plot DataFrame with matplotlib

2016-03-23 Thread Yavuz Nuzumlalı
Hi all, I'm trying to plot the result of a simple PCA operation, but couldn't find a clear documentation about plotting data frames. Here is the output of my data frame: ++ |pca_features