Re: No auto decompress in Spark Java textFile function?

2015-09-09 Thread Chris Teoh
Thanks. What I noticed was the decompress works if the file is in HDFS but
not when it is a local file when working in a development environment.

Does anyone else have the same problem?
On Wed, 9 Sep 2015 at 4:40 pm Akhil Das  wrote:

> textFile used to work with .gz files, i haven't tested it on bz2 files. If
> it isn't decompressing by default then what you have to do is to use the
> sc.wholeTextFiles and then decompress each record (that being file) with
> the corresponding codec.
>
> Thanks
> Best Regards
>
> On Tue, Sep 8, 2015 at 6:49 PM, Chris Teoh  wrote:
>
>> Hi Folks,
>>
>> I tried using Spark v1.2 on bz2 files in Java but the behaviour is
>> different to the same textFile API call in Python and Scala.
>>
>> That being said, how do I process to read .tar.bz2 files in Spark's Java
>> API?
>>
>> Thanks in advance
>> Chris
>>
>
>


Re: No auto decompress in Spark Java textFile function?

2015-09-08 Thread Akhil Das
textFile used to work with .gz files, i haven't tested it on bz2 files. If
it isn't decompressing by default then what you have to do is to use the
sc.wholeTextFiles and then decompress each record (that being file) with
the corresponding codec.

Thanks
Best Regards

On Tue, Sep 8, 2015 at 6:49 PM, Chris Teoh  wrote:

> Hi Folks,
>
> I tried using Spark v1.2 on bz2 files in Java but the behaviour is
> different to the same textFile API call in Python and Scala.
>
> That being said, how do I process to read .tar.bz2 files in Spark's Java
> API?
>
> Thanks in advance
> Chris
>


No auto decompress in Spark Java textFile function?

2015-09-08 Thread Chris Teoh
Hi Folks,

I tried using Spark v1.2 on bz2 files in Java but the behaviour is
different to the same textFile API call in Python and Scala.

That being said, how do I process to read .tar.bz2 files in Spark's Java
API?

Thanks in advance
Chris