Re: How to read gzip data in Spark - Simple question

2015-08-06 Thread ๏̯͡๏
(๏̯͡๏) [deepuj...@gmail.com] *Sent: *Thursday, August 06, 2015 12:41 AM Eastern Standard Time *To: *Philip Weaver *Cc: *user *Subject: *Re: How to read gzip data in Spark - Simple question how do i persist the RDD to HDFS ? On Wed, Aug 5, 2015 at 8:32 PM, Philip Weaver philip.wea

How to read gzip data in Spark - Simple question

2015-08-05 Thread ๏̯͡๏
I have csv data that is embedded in gzip format on HDFS. *With Pig* a = load '/user/zeppelin/aggregatedsummary/2015/08/03/regular/part-m-3.gz' using PigStorage(); b = limit a 10

Re: How to read gzip data in Spark - Simple question

2015-08-05 Thread Philip Weaver
This message means that java.util.Date is not supported by Spark DataFrame. You'll need to use java.sql.Date, I believe. On Wed, Aug 5, 2015 at 8:29 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com wrote: That seem to be working. however i see a new exception Code: def formatStringAsDate(dateStr:

RE: How to read gzip data in Spark - Simple question

2015-08-05 Thread Ganelin, Ilya
To: Philip Weaver Cc: user Subject: Re: How to read gzip data in Spark - Simple question how do i persist the RDD to HDFS ? On Wed, Aug 5, 2015 at 8:32 PM, Philip Weaver philip.wea...@gmail.commailto:philip.wea...@gmail.com wrote: This message means that java.util.Date is not supported by Spark

Re: How to read gzip data in Spark - Simple question

2015-08-05 Thread Philip Weaver
The parallelize method does not read the contents of a file. It simply takes a collection and distributes it to the cluster. In this case, the String is a collection 67 characters. Use sc.textFile instead of sc.parallelize, and it should work as you want. On Wed, Aug 5, 2015 at 8:12 PM, ÐΞ€ρ@Ҝ

Re: How to read gzip data in Spark - Simple question

2015-08-05 Thread ๏̯͡๏
That seem to be working. however i see a new exception Code: def formatStringAsDate(dateStr: String) = new SimpleDateFormat(-MM-dd).parse(dateStr) //(2015-07-27,12459,,31242,6,Daily,-999,2099-01-01,2099-01-02,1,0,0.1,0,1,-1,isGeo,,,204,694.0,1.9236856708701322E-4,0.0,-4.48,0.0,0.0,0.0,) val

Re: How to read gzip data in Spark - Simple question

2015-08-05 Thread ๏̯͡๏
how do i persist the RDD to HDFS ? On Wed, Aug 5, 2015 at 8:32 PM, Philip Weaver philip.wea...@gmail.com wrote: This message means that java.util.Date is not supported by Spark DataFrame. You'll need to use java.sql.Date, I believe. On Wed, Aug 5, 2015 at 8:29 PM, ÐΞ€ρ@Ҝ (๏̯͡๏)

Re: How to read gzip data in Spark - Simple question

2015-08-05 Thread ๏̯͡๏
Code: val summary = rowStructText.map(s = s.split(,)).map( { s = Summary(formatStringAsDate(s(0)), s(1).replaceAll(\, ).toLong, s(3).replaceAll(\, ).toLong, s(4).replaceAll(\, ).toInt, s(5).replaceAll(\, ),