Re: Schema-less or dynamic schema use

Dmitriy Ryaboy Tue, 26 Aug 2014 11:31:47 -0700

1) you don't have to shell out to a compiler to generate code... but that's
complicated :).

2) Avro can be dynamic. I haven't played with that side of the world, but
this tutorial might help get you started:
https://github.com/AndreSchumacher/avro-parquet-spark-example

3) Do note that you should have 1 schema per dataset (maybe a schema you
didn't know until you started writing the dataset, but a schema
nonetheless). If your notion is to have a collection of totally different
objects, parquet is a bad choice.

D

On Tue, Aug 26, 2014 at 11:14 AM, Jim <[email protected]> wrote:

>
> Hello all,
>
> I couldn't find a user list so my apologies if this falls in the wrong
> place. I'm looking for a little guidance. I'm a newbie with respect to
> Parquet.
>
> We have a use case where we don't want concrete POJOs to represent data in
> our store. It's dynamic in that each data set is unique and dynamic and we
> need to handle incoming datasets at runtime.
>
> Examples of how to write to Parquet are sparse and all of the ones I could
> find assume Thrift/Avro/Protobuf IDL and generated schema and POJOs. I
> don't want to dynamically generate an IDL, shell out to a compiler, and
> classload the results in order to use Parquet. Is there an example that
> does what I'm looking for?
>
> Thanks
> Jim
>
>

Re: Schema-less or dynamic schema use

Reply via email to