Viju,

JSon data are not *typed* - Parquet requires types - and Avro is a perfect
packaging as it provides the typing in the parquet format

So you will need to convert your messages into Avro ( and that's a best
practice in *Kafka* by the way ).

The reason you need to do that is that for example in the following JSon
you need to define whether it an INT or a LONG etc ..

   "age" : 34

How to convert JSon to *Avro* ?

- You could get away with using a library like Avro4s (check in github)
that does a best effort conversion - but that would not be a very robust
solution
- The other way would be to actually type the code : Read each JSon ->
Construct an Avro record (via hard-coded types)

Then all you need is store those Avro into *Parquet*.

The bottom line is that Parquet , cannot store Json msg - as it requires
types


- Antonios

On Tue, Sep 27, 2016 at 8:25 PM, VIJJU CH <vijju5...@gmail.com> wrote:

> Hello,
>
> We have a scenario where we currently use Apache Kafka. We are having the
> Kafka message which are in JSON format in a Kafka topic. From Kafka topic
> we are sending JSON messages to Amazon S3.
>
> Can we do like reading the messages from Apache Kafka topic which are in
> JSON and then convert them to Parquet format and store them in S3 ?
>
> Reply me at your earliest convenience.
>
> Thanks,
> Vijju
>

Reply via email to