Re: Getting an error when trying to read a GZIPPED file

2015-09-04 Thread Akhil Das
Are you doing a .cache after the sc.textFile? If so, you can set the
StorageLevel to MEMORY_AND_DISK to avoid that.

Thanks
Best Regards

On Thu, Sep 3, 2015 at 10:11 AM, Spark Enthusiast 
wrote:

> Folks,
>
> I have an input file which is gzipped. I use sc.textFile("foo.gz") when I
> see the following problem. Can someone help me how to fix this?
>
> 15/09/03 10:05:32 INFO deprecation: mapred.job.id is deprecated. Instead,
> use mapreduce.job.id
> 15/09/03 10:05:32 INFO CodecPool: Got brand-new decompressor [.gz]
> 15/09/03 10:06:15 WARN MemoryStore: Not enough space to cache rdd_2_0 in
> memory! (computed 216.3 MB so far)
> 15/09/03 10:06:15 INFO MemoryStore: Memory use = 156.2 KB (blocks) + 213.1
> MB (scratch space shared across 1 thread(s)) = 213.3 MB. Storage limit =
> 265.1 MB.
>
>
>
>
>


Getting an error when trying to read a GZIPPED file

2015-09-02 Thread Spark Enthusiast
Folks,
I have an input file which is gzipped. I use sc.textFile("foo.gz") when I see 
the following problem. Can someone help me how to fix this?
15/09/03 10:05:32 INFO deprecation: mapred.job.id is deprecated. Instead, use 
mapreduce.job.id15/09/03 10:05:32 INFO CodecPool: Got brand-new decompressor 
[.gz]15/09/03 10:06:15 WARN MemoryStore: Not enough space to cache rdd_2_0 in 
memory! (computed 216.3 MB so far)15/09/03 10:06:15 INFO MemoryStore: Memory 
use = 156.2 KB (blocks) + 213.1 MB (scratch space shared across 1 thread(s)) = 
213.3 MB. Storage limit = 265.1 MB.