On 31 Dec 2016, at 16:09, Raymond Xie
> wrote:
Hello Felix,
I followed the instruction and ran the command:
> $SPARK_HOME/bin/spark-shell --packages com.databricks:spark-csv_2.11:1.5.0
and I received the following error message:
Hmm this would seem unrelated? Does it work on the same box without the
package? Do you have more of the error stack you can share?
_
From: Raymond Xie >
Sent: Saturday, December 31, 2016 8:09 AM
Subject: Re: How to
Hello Felix,
I followed the instruction and ran the command:
> $SPARK_HOME/bin/spark-shell --packages com.databricks:spark-csv_2.11:1.5.0
and I received the following error message:
java.lang.RuntimeException: java.net.ConnectException: Call From xie1/
192.168.112.150 to localhost:9000 failed
Thanks Felix, I will try it tomorrow
~~~sent from my cell phone, sorry if there is any typo
2016年12月30日 下午10:08,"Felix Cheung" 写道:
> Have you tried the spark-csv package?
>
> https://spark-packages.org/package/databricks/spark-csv
>
>
> --
yes, I believe there should be a better way to handle my case.
~~~sent from my cell phone, sorry if there is any typo
2016年12月30日 下午10:09,"write2sivakumar@gmail" 写道:
Hi Raymond,
Your problem is to pass those 100 fields to .toDF() method??
Sent from my Samsung
You can use the structtype and structfield approach or use the inferSchema
approach.
Sent from my T-Mobile 4G LTE Device
Original message
From: "write2sivakumar@gmail"
Date: 12/30/16 10:08 PM (GMT-05:00)
To: Raymond Xie
Have you tried the spark-csv package?
https://spark-packages.org/package/databricks/spark-csv
From: Raymond Xie
Sent: Friday, December 30, 2016 6:46:11 PM
To: user@spark.apache.org
Subject: How to load a big csv to dataframe in Spark 1.6
Hi Raymond,
Your problem is to pass those 100 fields to .toDF() method??
Sent from my Samsung device
Original message
From: Raymond Xie
Date: 31/12/2016 10:46 (GMT+08:00)
To: user@spark.apache.org
Subject: How to load a big csv to dataframe in
Hello,
I see there is usually this way to load a csv to dataframe:
sqlContext = SQLContext(sc)
Employee_rdd = sc.textFile("\..\Employee.csv")
.map(lambda line: line.split(","))
Employee_df = Employee_rdd.toDF(['Employee_ID','Employee_name'])
Employee_df.show()
However in my