Re: SparkSQL - can we add new column(s) to parquet files

Evan Chan Fri, 21 Nov 2014 14:52:21 -0800

I would expect an SQL query on c would fail because c would not be known in
the schema of the older Parquet file.

What I'd be very interested in is how to add a new column as an incremental
new parquet file, and be able to somehow join the existing and new file, in
an efficient way.   IE, somehow guarantee that for every row in the old
parquet file, that the corresponding rows in the new file would be stored
in the same node, so that joins are local.

On Fri, Nov 21, 2014 at 10:03 AM, Sadhan Sood <sadhan.s...@gmail.com> wrote:

> We create the table definition by reading the parquet file for schema and
> store it in hive metastore. But if someone adds a new column to the schema,
> and if we rescan the schema from the new parquet files and update the table
> definition, would it still work if we run queries on the table ?
>
> So, old table has -> Int a, Int b
> new table -> Int a, Int b, String c
>
> but older parquet files don't have String c, so on querying the table
> would it return me null for column c  from older files and data from newer
> files or fail?
>

Re: SparkSQL - can we add new column(s) to parquet files

Reply via email to