Re: No auto decompress in Spark Java textFile function?
Thanks. What I noticed was the decompress works if the file is in HDFS but not when it is a local file when working in a development environment. Does anyone else have the same problem? On Wed, 9 Sep 2015 at 4:40 pm Akhil Das wrote: > textFile used to work with .gz files, i haven't tested it on bz2 files. If > it isn't decompressing by default then what you have to do is to use the > sc.wholeTextFiles and then decompress each record (that being file) with > the corresponding codec. > > Thanks > Best Regards > > On Tue, Sep 8, 2015 at 6:49 PM, Chris Teoh wrote: > >> Hi Folks, >> >> I tried using Spark v1.2 on bz2 files in Java but the behaviour is >> different to the same textFile API call in Python and Scala. >> >> That being said, how do I process to read .tar.bz2 files in Spark's Java >> API? >> >> Thanks in advance >> Chris >> > >
Re: No auto decompress in Spark Java textFile function?
textFile used to work with .gz files, i haven't tested it on bz2 files. If it isn't decompressing by default then what you have to do is to use the sc.wholeTextFiles and then decompress each record (that being file) with the corresponding codec. Thanks Best Regards On Tue, Sep 8, 2015 at 6:49 PM, Chris Teoh wrote: > Hi Folks, > > I tried using Spark v1.2 on bz2 files in Java but the behaviour is > different to the same textFile API call in Python and Scala. > > That being said, how do I process to read .tar.bz2 files in Spark's Java > API? > > Thanks in advance > Chris >
No auto decompress in Spark Java textFile function?
Hi Folks, I tried using Spark v1.2 on bz2 files in Java but the behaviour is different to the same textFile API call in Python and Scala. That being said, how do I process to read .tar.bz2 files in Spark's Java API? Thanks in advance Chris