Hi, I want to read CSV file in pyspark
I am running pyspark on pycharm I am trying to load a csv using pyspark import os import sys os.environ['SPARK_HOME']="/Users/devesh/Downloads/spark-1.5.1-bin-hadoop2.6" sys.path.append("/Users/devesh/Downloads/spark-1.5.1-bin-hadoop2.6/python/") # Now we are ready to import Spark Modules try: from pyspark import SparkContext from pyspark import SparkConf from pyspark.mllib.fpm import FPGrowth print ("Successfully imported all Spark Modules") except ImportError as e: print ("Error importing Spark Modules", e) sys.exit(1) sc = SparkContext('local') from pyspark.sql import HiveContext, SQLContext from pyspark.sql import SQLContext df = sqlContext.read.format('com.databricks.spark.csv').options(header='true', inferschema='true').load('/Users/devesh/work/iris/iris.csv') I am getting the following error Py4JJavaError: An error occurred while calling o88.load. : java.lang.ClassNotFoundException: Failed to load class for data source: com.databricks.spark.csv. at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.lookupDataSource(ResolvedDataSource.scala:67) -- Warm regards, Devesh.