Thanks, since it is just a snippt do you mean that Inflater is coming from ZLIB? Eran
On Fri, Dec 18, 2015 at 11:37 AM Akhil Das <ak...@sigmoidanalytics.com> wrote: > Something like this? This one uses the ZLIB compression, you can replace > the decompression logic with GZip one in your case. > > compressedStream.map(x => { > val inflater = new Inflater() > inflater.setInput(x.getPayload) > val decompressedData = new Array[Byte](x.getPayload.size * 2) > var count = inflater.inflate(decompressedData) > var finalData = decompressedData.take(count) > while (count > 0) { > count = inflater.inflate(decompressedData) > finalData = finalData ++ decompressedData.take(count) > } > new String(finalData) > }) > > > > > Thanks > Best Regards > > On Wed, Dec 16, 2015 at 10:02 PM, Eran Witkon <eranwit...@gmail.com> > wrote: > >> Hi, >> I have a few JSON files in which one of the field is a binary filed - >> this field is the output of running GZIP of a JSON stream and compressing >> it to the binary field. >> >> Now I want to de-compress the field and get the outpur JSON. >> I was thinking of running map operation and passing a function to the map >> operation which will decompress each JSON file. >> the above function will find the right field in the outer JSON and then >> run GUNZIP on it. >> >> 1) is this a valid practice for spark map job? >> 2) any pointer on how to do that? >> >> Eran >> > >