Re: Parquet schema migrations

Andrew Ash Sun, 05 Oct 2014 16:00:05 -0700

Hi Cody,

I wasn't aware there were different versions of the parquet format.  What's
the difference between "raw parquet" and the Hive-written parquet files?

As for your migration question, the approaches I've often seen are
convert-on-read and convert-all-at-once.  Apache Cassandra for example does
both -- when upgrading between Cassandra versions that change the on-disk
sstable format, it will do a convert-on-read as you access the sstables, or
you can run the upgradesstables command to convert them all at once
post-upgrade.

Andrew

On Fri, Oct 3, 2014 at 4:33 PM, Cody Koeninger <[email protected]> wrote:

> Wondering if anyone has thoughts on a path forward for parquet schema
> migrations, especially for people (like us) that are using raw parquet
> files rather than Hive.
>
> So far we've gotten away with reading old files, converting, and writing to
> new directories, but that obviously becomes problematic above a certain
> data size.
>

Re: Parquet schema migrations

Reply via email to