Re: .tar.bz2 in spark

2016-12-08 Thread Jörn Franke
Tar is not out of the box supported. Just store the file as .json.bz2 without using tar. > On 8 Dec 2016, at 20:18, Maurin Lenglart <mau...@cuberonlabs.com> wrote: > > Hi, > I am trying to load a json file compress in .tar.bz2 but spark throw an error. > I am using pys

.tar.bz2 in spark

2016-12-08 Thread Maurin Lenglart
Hi, I am trying to load a json file compress in .tar.bz2 but spark throw an error. I am using pyspark with spark 1.6.2. (Cloudera 5.9) What will be the best way to handle that? I don’t want to have a non-spark job that will just uncompressed the data… thanks