Hi Cody, I wasn't aware there were different versions of the parquet format. What's the difference between "raw parquet" and the Hive-written parquet files?
As for your migration question, the approaches I've often seen are convert-on-read and convert-all-at-once. Apache Cassandra for example does both -- when upgrading between Cassandra versions that change the on-disk sstable format, it will do a convert-on-read as you access the sstables, or you can run the upgradesstables command to convert them all at once post-upgrade. Andrew On Fri, Oct 3, 2014 at 4:33 PM, Cody Koeninger <c...@koeninger.org> wrote: > Wondering if anyone has thoughts on a path forward for parquet schema > migrations, especially for people (like us) that are using raw parquet > files rather than Hive. > > So far we've gotten away with reading old files, converting, and writing to > new directories, but that obviously becomes problematic above a certain > data size. >