FYI,
Latest hive 0.14/parquet will have column renaming support.
Jianshi
On Wed, Dec 10, 2014 at 3:37 AM, Michael Armbrust mich...@databricks.com
wrote:
You might also try out the recently added support for views.
On Mon, Dec 8, 2014 at 9:31 PM, Jianshi Huang jianshi.hu...@gmail.com
wrote:
You might also try out the recently added support for views.
On Mon, Dec 8, 2014 at 9:31 PM, Jianshi Huang jianshi.hu...@gmail.com
wrote:
Ah... I see. Thanks for pointing it out.
Then it means we cannot mount external table using customized column
names. hmm...
Then the only option left is
This is by hive's design. From the Hive documentation:
The column change command will only modify Hive's metadata, and will not
modify data. Users should make sure the actual data layout of the
table/partition conforms with the metadata definition.
On Sat, Dec 6, 2014 at 8:28 PM, Jianshi
Ah... I see. Thanks for pointing it out.
Then it means we cannot mount external table using customized column names.
hmm...
Then the only option left is to use a subquery to add a bunch of column
alias. I'll try it later.
Thanks,
Jianshi
On Tue, Dec 9, 2014 at 3:34 AM, Michael Armbrust
Very interesting, the line doing drop table will throws an exception. After
removing it all works.
Jianshi
On Sat, Dec 6, 2014 at 9:11 AM, Jianshi Huang jianshi.hu...@gmail.com
wrote:
Here's the solution I got after talking with Liancheng:
1) using backquote `..` to wrap up all illegal
Hmm... another issue I found doing this approach is that ANALYZE TABLE ...
COMPUTE STATISTICS will fail to attach the metadata to the table, and later
broadcast join and such will fail...
Any idea how to fix this issue?
Jianshi
On Sat, Dec 6, 2014 at 9:10 PM, Jianshi Huang
Ok, found another possible bug in Hive.
My current solution is to use ALTER TABLE CHANGE to rename the column names.
The problem is after renaming the column names, the value of the columns
became all NULL.
Before renaming:
scala sql(select `sorted::cre_ts` from pmt limit 1).collect
res12:
Hi,
I had to use Pig for some preprocessing and to generate Parquet files for
Spark to consume.
However, due to Pig's limitation, the generated schema contains Pig's
identifier
e.g.
sorted::id, sorted::cre_ts, ...
I tried to put the schema inside CREATE EXTERNAL TABLE, e.g.
create external
Here's the solution I got after talking with Liancheng:
1) using backquote `..` to wrap up all illegal characters
val rdd = parquetFile(file)
val schema = rdd.schema.fields.map(f = s`${f.name}`
${HiveMetastoreTypes.toMetastoreType(f.dataType)}).mkString(,\n)
val ddl_13 = s