Re: How to load a big csv to dataframe in Spark 1.6

2017-01-03 Thread Steve Loughran

On 31 Dec 2016, at 16:09, Raymond Xie 
> wrote:

Hello Felix,

I followed the instruction and ran the command:

> $SPARK_HOME/bin/spark-shell --packages com.databricks:spark-csv_2.11:1.5.0

and I received the following error message:
java.lang.RuntimeException: java.net.ConnectException: Call From 
xie1/192.168.112.150 to localhost:9000 failed on 
connection exception: java.net.ConnectException: Connection refused; For more 
details see:  http://wiki.apache.org/hadoop/ConnectionRefused



Did you look at the wiki page? If not, why not?



Re: How to load a big csv to dataframe in Spark 1.6

2016-12-31 Thread Felix Cheung
Hmm this would seem unrelated? Does it work on the same box without the 
package? Do you have more of the error stack you can share?


_
From: Raymond Xie >
Sent: Saturday, December 31, 2016 8:09 AM
Subject: Re: How to load a big csv to dataframe in Spark 1.6
To: Felix Cheung >
Cc: >


Hello Felix,

I followed the instruction and ran the command:

> $SPARK_HOME/bin/spark-shell --packages com.databricks:spark-csv_2.11:1.5.0

and I received the following error message:
java.lang.RuntimeException: java.net.ConnectException: Call From 
xie1/192.168.112.150 to localhost:9000 failed on 
connection exception: java.net.ConnectException: Connection refused; For more 
details see:  http://wiki.apache.org/hadoop/ConnectionRefused

any thought?




Sincerely yours,


Raymond

On Fri, Dec 30, 2016 at 10:08 PM, Felix Cheung 
> wrote:
Have you tried the spark-csv package?

https://spark-packages.org/package/databricks/spark-csv



From: Raymond Xie >
Sent: Friday, December 30, 2016 6:46:11 PM
To: user@spark.apache.org
Subject: How to load a big csv to dataframe in Spark 1.6

Hello,

I see there is usually this way to load a csv to dataframe:


sqlContext = SQLContext(sc)Employee_rdd = sc.textFile("\..\Employee.csv")   
.map(lambda line: line.split(","))Employee_df = 
Employee_rdd.toDF(['Employee_ID','Employee_name'])Employee_df.show()

However in my case my csv has 100+ fields, which means toDF() will be very 
lengthy.

Can anyone tell me a practical method to load the data?

Thank you very much.


Raymond






Re: How to load a big csv to dataframe in Spark 1.6

2016-12-31 Thread Raymond Xie
Hello Felix,

I followed the instruction and ran the command:

> $SPARK_HOME/bin/spark-shell --packages com.databricks:spark-csv_2.11:1.5.0

and I received the following error message:
java.lang.RuntimeException: java.net.ConnectException: Call From xie1/
192.168.112.150 to localhost:9000 failed on connection exception:
java.net.ConnectException: Connection refused; For more details see:
http://wiki.apache.org/hadoop/ConnectionRefused

any thought?



**
*Sincerely yours,*


*Raymond*

On Fri, Dec 30, 2016 at 10:08 PM, Felix Cheung 
wrote:

> Have you tried the spark-csv package?
>
> https://spark-packages.org/package/databricks/spark-csv
>
>
> --
> *From:* Raymond Xie 
> *Sent:* Friday, December 30, 2016 6:46:11 PM
> *To:* user@spark.apache.org
> *Subject:* How to load a big csv to dataframe in Spark 1.6
>
> Hello,
>
> I see there is usually this way to load a csv to dataframe:
>
> sqlContext = SQLContext(sc)
>
> Employee_rdd = sc.textFile("\..\Employee.csv")
>.map(lambda line: line.split(","))
>
> Employee_df = Employee_rdd.toDF(['Employee_ID','Employee_name'])
>
> Employee_df.show()
>
> However in my case my csv has 100+ fields, which means toDF() will be very
> lengthy.
>
> Can anyone tell me a practical method to load the data?
>
> Thank you very much.
>
>
> *Raymond*
>
>


Re: How to load a big csv to dataframe in Spark 1.6

2016-12-30 Thread Raymond Xie
Thanks Felix, I will try it tomorrow

~~~sent from my cell phone, sorry if there is any typo

2016年12月30日 下午10:08,"Felix Cheung" 写道:

> Have you tried the spark-csv package?
>
> https://spark-packages.org/package/databricks/spark-csv
>
>
> --
> *From:* Raymond Xie 
> *Sent:* Friday, December 30, 2016 6:46:11 PM
> *To:* user@spark.apache.org
> *Subject:* How to load a big csv to dataframe in Spark 1.6
>
> Hello,
>
> I see there is usually this way to load a csv to dataframe:
>
> sqlContext = SQLContext(sc)
>
> Employee_rdd = sc.textFile("\..\Employee.csv")
>.map(lambda line: line.split(","))
>
> Employee_df = Employee_rdd.toDF(['Employee_ID','Employee_name'])
>
> Employee_df.show()
>
> However in my case my csv has 100+ fields, which means toDF() will be very
> lengthy.
>
> Can anyone tell me a practical method to load the data?
>
> Thank you very much.
>
>
> *Raymond*
>
>


Re: How to load a big csv to dataframe in Spark 1.6

2016-12-30 Thread Raymond Xie
yes, I believe there should be a better way to handle my case.

~~~sent from my cell phone, sorry if there is any typo

2016年12月30日 下午10:09,"write2sivakumar@gmail" 写道:

Hi Raymond,

Your problem is to pass those 100 fields to .toDF() method??



Sent from my Samsung device


 Original message 
From: Raymond Xie 
Date: 31/12/2016 10:46 (GMT+08:00)
To: user@spark.apache.org
Subject: How to load a big csv to dataframe in Spark 1.6

Hello,

I see there is usually this way to load a csv to dataframe:

sqlContext = SQLContext(sc)

Employee_rdd = sc.textFile("\..\Employee.csv")
   .map(lambda line: line.split(","))

Employee_df = Employee_rdd.toDF(['Employee_ID','Employee_name'])

Employee_df.show()

However in my case my csv has 100+ fields, which means toDF() will be very
lengthy.

Can anyone tell me a practical method to load the data?

Thank you very much.


*Raymond*


Re: How to load a big csv to dataframe in Spark 1.6

2016-12-30 Thread theodondre


You can use the structtype and structfield approach or use the inferSchema 
approach.


Sent from my T-Mobile 4G LTE Device

 Original message 
From: "write2sivakumar@gmail"  
Date: 12/30/16  10:08 PM  (GMT-05:00) 
To: Raymond Xie , user@spark.apache.org 
Subject: Re: How to load a big csv to dataframe in Spark 1.6 



Hi Raymond,
Your problem is to pass those 100 fields to .toDF() method??


Sent from my Samsung device

 Original message 
From: Raymond Xie  
Date: 31/12/2016  10:46  (GMT+08:00) 
To: user@spark.apache.org 
Subject: How to load a big csv to dataframe in Spark 1.6 

Hello,
I see there is usually this way to load a csv to dataframe:
sqlContext = SQLContext(sc)

Employee_rdd = sc.textFile("\..\Employee.csv")
   .map(lambda line: line.split(","))

Employee_df = Employee_rdd.toDF(['Employee_ID','Employee_name'])

Employee_df.show()However in my case my csv has 100+ fields, which means toDF() 
will be very lengthy.
Can anyone tell me a practical method to load the data?
Thank you very much.

Raymond







Re: How to load a big csv to dataframe in Spark 1.6

2016-12-30 Thread Felix Cheung
Have you tried the spark-csv package?

https://spark-packages.org/package/databricks/spark-csv



From: Raymond Xie 
Sent: Friday, December 30, 2016 6:46:11 PM
To: user@spark.apache.org
Subject: How to load a big csv to dataframe in Spark 1.6

Hello,

I see there is usually this way to load a csv to dataframe:


sqlContext = SQLContext(sc)

Employee_rdd = sc.textFile("\..\Employee.csv")
   .map(lambda line: line.split(","))

Employee_df = Employee_rdd.toDF(['Employee_ID','Employee_name'])

Employee_df.show()

However in my case my csv has 100+ fields, which means toDF() will be very 
lengthy.

Can anyone tell me a practical method to load the data?

Thank you very much.


Raymond



Re: How to load a big csv to dataframe in Spark 1.6

2016-12-30 Thread write2sivakumar@gmail


Hi Raymond,
Your problem is to pass those 100 fields to .toDF() method??


Sent from my Samsung device

 Original message 
From: Raymond Xie  
Date: 31/12/2016  10:46  (GMT+08:00) 
To: user@spark.apache.org 
Subject: How to load a big csv to dataframe in Spark 1.6 

Hello,
I see there is usually this way to load a csv to dataframe:
sqlContext = SQLContext(sc)

Employee_rdd = sc.textFile("\..\Employee.csv")
   .map(lambda line: line.split(","))

Employee_df = Employee_rdd.toDF(['Employee_ID','Employee_name'])

Employee_df.show()However in my case my csv has 100+ fields, which means toDF() 
will be very lengthy.
Can anyone tell me a practical method to load the data?
Thank you very much.

Raymond







How to load a big csv to dataframe in Spark 1.6

2016-12-30 Thread Raymond Xie
Hello,

I see there is usually this way to load a csv to dataframe:

sqlContext = SQLContext(sc)

Employee_rdd = sc.textFile("\..\Employee.csv")
   .map(lambda line: line.split(","))

Employee_df = Employee_rdd.toDF(['Employee_ID','Employee_name'])

Employee_df.show()

However in my case my csv has 100+ fields, which means toDF() will be very
lengthy.

Can anyone tell me a practical method to load the data?

Thank you very much.


*Raymond*