(๏̯͡๏) [deepuj...@gmail.com]
*Sent: *Thursday, August 06, 2015 12:41 AM Eastern Standard Time
*To: *Philip Weaver
*Cc: *user
*Subject: *Re: How to read gzip data in Spark - Simple question
how do i persist the RDD to HDFS ?
On Wed, Aug 5, 2015 at 8:32 PM, Philip Weaver philip.wea
I have csv data that is embedded in gzip format on HDFS.
*With Pig*
a = load
'/user/zeppelin/aggregatedsummary/2015/08/03/regular/part-m-3.gz' using
PigStorage();
b = limit a 10
This message means that java.util.Date is not supported by Spark DataFrame.
You'll need to use java.sql.Date, I believe.
On Wed, Aug 5, 2015 at 8:29 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com wrote:
That seem to be working. however i see a new exception
Code:
def formatStringAsDate(dateStr:
To: Philip Weaver
Cc: user
Subject: Re: How to read gzip data in Spark - Simple question
how do i persist the RDD to HDFS ?
On Wed, Aug 5, 2015 at 8:32 PM, Philip Weaver
philip.wea...@gmail.commailto:philip.wea...@gmail.com wrote:
This message means that java.util.Date is not supported by Spark
The parallelize method does not read the contents of a file. It simply
takes a collection and distributes it to the cluster. In this case, the
String is a collection 67 characters.
Use sc.textFile instead of sc.parallelize, and it should work as you want.
On Wed, Aug 5, 2015 at 8:12 PM, ÐΞ€ρ@Ҝ
That seem to be working. however i see a new exception
Code:
def formatStringAsDate(dateStr: String) = new
SimpleDateFormat(-MM-dd).parse(dateStr)
//(2015-07-27,12459,,31242,6,Daily,-999,2099-01-01,2099-01-02,1,0,0.1,0,1,-1,isGeo,,,204,694.0,1.9236856708701322E-4,0.0,-4.48,0.0,0.0,0.0,)
val
how do i persist the RDD to HDFS ?
On Wed, Aug 5, 2015 at 8:32 PM, Philip Weaver philip.wea...@gmail.com
wrote:
This message means that java.util.Date is not supported by Spark
DataFrame. You'll need to use java.sql.Date, I believe.
On Wed, Aug 5, 2015 at 8:29 PM, ÐΞ€ρ@Ҝ (๏̯͡๏)
Code:
val summary = rowStructText.map(s = s.split(,)).map(
{
s =
Summary(formatStringAsDate(s(0)),
s(1).replaceAll(\, ).toLong,
s(3).replaceAll(\, ).toLong,
s(4).replaceAll(\, ).toInt,
s(5).replaceAll(\, ),