Re: Spark-hive parquet schema evolution

2015-07-22 Thread Cheng Lian
Yeah, the benefit of `saveAsTable` is that you don't need to deal with schema explicitly, while the benefit of ALTER TABLE is you still have a standard vanilla Hive table. Cheng On 7/22/15 11:00 PM, Dean Wampler wrote: While it's not recommended to overwrite files Hive thinks it understands,

Re: Spark-hive parquet schema evolution

2015-07-22 Thread Cheng Lian
Since Hive doesn’t support schema evolution, you’ll have to update the schema stored in metastore somehow. For example, you can create a new external table with the merged schema. Say you have a Hive table |t1|: |CREATE TABLE t1 (c0 INT, c1 DOUBLE); | By default, this table is stored in HDFS

Re: Spark-hive parquet schema evolution

2015-07-22 Thread Dean Wampler
While it's not recommended to overwrite files Hive thinks it understands, you can add the column to Hive's metastore using an ALTER TABLE command using HiveQL in the Hive shell or using HiveContext.sql(): ALTER TABLE mytable ADD COLUMNS col_name data_type See

Re: Spark-hive parquet schema evolution

2015-07-21 Thread Jerrick Hoang
Hi Lian, Sorry I'm new to Spark so I did not express myself very clearly. I'm concerned about the situation when let's say I have a Parquet table some partitions and I add a new column A to parquet schema and write some data with the new schema to a new partition in the table. If i'm not

Re: Spark-hive parquet schema evolution

2015-07-21 Thread Cheng Lian
Hey Jerrick, What do you mean by schema evolution with Hive metastore tables? Hive doesn't take schema evolution into account. Could you please give a concrete use case? Are you trying to write Parquet data with extra columns into an existing metastore Parquet table? Cheng On 7/21/15 1:04

Re: Spark-hive parquet schema evolution

2015-07-20 Thread Jerrick Hoang
I'm new to Spark, any ideas would be much appreciated! Thanks On Sat, Jul 18, 2015 at 11:11 AM, Jerrick Hoang jerrickho...@gmail.com wrote: Hi all, I'm aware of the support for schema evolution via DataFrame API. Just wondering what would be the best way to go about dealing with schema

Spark-hive parquet schema evolution

2015-07-18 Thread Jerrick Hoang
Hi all, I'm aware of the support for schema evolution via DataFrame API. Just wondering what would be the best way to go about dealing with schema evolution with Hive metastore tables. So, say I create a table via SparkSQL CLI, how would I deal with Parquet schema evolution? Thanks, J