Re: Creating Spark DataFrame from large pandas DataFrame

2015-08-21 Thread ayan guha
The easiest option I found to put jars in SPARK CLASSPATH On 21 Aug 2015 06:20, Burak Yavuz brk...@gmail.com wrote: If you would like to try using spark-csv, please use `pyspark --packages com.databricks:spark-csv_2.11:1.2.0` You're missing a dependency. Best, Burak On Thu, Aug 20, 2015

Re: Creating Spark DataFrame from large pandas DataFrame

2015-08-20 Thread Burak Yavuz
If you would like to try using spark-csv, please use `pyspark --packages com.databricks:spark-csv_2.11:1.2.0` You're missing a dependency. Best, Burak On Thu, Aug 20, 2015 at 1:08 PM, Charlie Hack charles.t.h...@gmail.com wrote: Hi, I'm new to spark and am trying to create a Spark df from a

Creating Spark DataFrame from large pandas DataFrame

2015-08-20 Thread Charlie Hack
Hi, I'm new to spark and am trying to create a Spark df from a pandas df with ~5 million rows. Using Spark 1.4.1. When I type: df = sqlContext.createDataFrame(pandas_df.where(pd.notnull(didf), None)) (the df.where is a hack I found on the Spark JIRA to avoid a problem with NaN values making