Hi,

I'm wondering why DRILL-4120 has been pushed back to 1.6.

I have no idea if we are the only ones using directory pruning with Avro
but we use Avro for streaming/fresh data before a Parquet conversion and
this would be a welcome fix.

Pet peeve - Avro Schema validation.

Some facts:

   - The Map structure supported by Avro can not be validated with a schema
   as it allows keys to vary and only ensures the data type of the value.

   - Evolving schema will fail with the current Avro validation when
   directory pruning is used unless all file headers, even in the pruned
   directories, are scanned

   - Schema validation in Avro and schema validation in Parquet are
   different

This, and in my opinion many other things, mean that the strict schema
validation in Avro should be a opt-in arrangement for those wanting stop
evolving their schema and put all their entries in a single file /
directory.

Additionally,  Avro 1.8 is just out and it, plus the parquet-avro now
support timestamp fields. It would be a great benefit of hafin proper date
/ timestamp handling in Avro and the Avro->Parquet conversion.

Yours truly,
  - The Slightly Disgruntled Drill-Avro User

Reply via email to