Re: Using Spark to process JSON with gzip filed

Eran Witkon Sat, 19 Dec 2015 13:54:06 -0800

Thanks, since it is just a snippt do you mean that Inflater is coming from
ZLIB?
Eran


On Fri, Dec 18, 2015 at 11:37 AM Akhil Das <ak...@sigmoidanalytics.com>
wrote:

> Something like this? This one uses the ZLIB compression, you can replace
> the decompression logic with GZip one in your case.
>
> compressedStream.map(x => {
>       val inflater = new Inflater()
>       inflater.setInput(x.getPayload)
>       val decompressedData = new Array[Byte](x.getPayload.size * 2)
>       var count = inflater.inflate(decompressedData)
>       var finalData = decompressedData.take(count)
>       while (count > 0) {
>         count = inflater.inflate(decompressedData)
>         finalData = finalData ++ decompressedData.take(count)
>       }
>       new String(finalData)
>     })
>
>
>
>
> Thanks
> Best Regards
>
> On Wed, Dec 16, 2015 at 10:02 PM, Eran Witkon <eranwit...@gmail.com>
> wrote:
>
>> Hi,
>> I have a few JSON files in which one of the field is a binary filed -
>> this field is the output of running GZIP of a JSON stream and compressing
>> it to the binary field.
>>
>> Now I want to de-compress the field and get the outpur JSON.
>> I was thinking of running map operation and passing a function to the map
>> operation which will decompress each JSON file.
>> the above function will find the right field in the outer JSON and then
>> run GUNZIP on it.
>>
>> 1) is this a valid practice for spark map job?
>> 2) any pointer on how to do that?
>>
>> Eran
>>
>
>

Re: Using Spark to process JSON with gzip filed

Reply via email to