1) you don't have to shell out to a compiler to generate code... but that's complicated :).
2) Avro can be dynamic. I haven't played with that side of the world, but this tutorial might help get you started: https://github.com/AndreSchumacher/avro-parquet-spark-example 3) Do note that you should have 1 schema per dataset (maybe a schema you didn't know until you started writing the dataset, but a schema nonetheless). If your notion is to have a collection of totally different objects, parquet is a bad choice. D On Tue, Aug 26, 2014 at 11:14 AM, Jim <[email protected]> wrote: > > Hello all, > > I couldn't find a user list so my apologies if this falls in the wrong > place. I'm looking for a little guidance. I'm a newbie with respect to > Parquet. > > We have a use case where we don't want concrete POJOs to represent data in > our store. It's dynamic in that each data set is unique and dynamic and we > need to handle incoming datasets at runtime. > > Examples of how to write to Parquet are sparse and all of the ones I could > find assume Thrift/Avro/Protobuf IDL and generated schema and POJOs. I > don't want to dynamically generate an IDL, shell out to a compiler, and > classload the results in order to use Parquet. Is there an example that > does what I'm looking for? > > Thanks > Jim > >
