Hello all,

I am new to Spark and I want to analyze csv file using Spark on my local 
machine. The csv files contains airline database and I want to get a few 
descriptive statistics (e.g. maximum of one column, mean, standard deviation in 
a column, etc.) for my file. I am reading the file using simple 
sc.textFile("file.csv"). The queries are:


1.      Is there any optimal way of reading the file so that loading takes less 
amount of time in Spark. The file can be of 3GB.

2.      How to handle column manipulations according to the type of queries 
given above.

Thank you

Regards,
Vineet Hingorani


Reply via email to