Attila Magyar created HIVE-23253:
------------------------------------
Summary: Synchronization between external SerDe schemas and
Metastore
Key: HIVE-23253
URL: https://issues.apache.org/jira/browse/HIVE-23253
Project: Hive
Issue Type: Bug
Components: Hive, Metastore
Affects Versions: 3.1.2
Reporter: Attila Magyar
Fix For: 3.0.0
In HIVE-15995 an ALTER <table> UPDATE COLUMNS statement was introduce to sync
external SerDe schema changes with the metastore. This command can only be
manually invoked.
See it in the documentation.
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterTable/PartitionUpdatecolumns
Maybe it would make sense to run an update columns automatically in certain
cases to prevent problems coming from cases where the user forgets running the
update columns manually.
One way to reproduce the issue is to change the schema url via an alter table
statement.
{code:java}
[root@c7401 vagrant]# cat test_schema1.avsc
{
"type":"record",
"name":"test_schema",
"namespace":"gdc_datascience_qa",
"fields":[
{
"name":"name",
"type":[
"null",
"string"
],
"default":null
}
]
}[root@c7401 vagrant]# cat test_schema2.avsc
{
"type":"record",
"name":"test_schema",
"namespace":"gdc_datascience_qa",
"fields":[
{
"name":"name",
"type":[
"null",
"string"
],
"default":null
},
{
"name":"last_name",
"type":[
"null",
"string"
],
"default":null
}
]
}
{code}
{code:java}
$ hadoop fs -copyFromLocal *.avsc /tmp/
[beeline] create external table t1 stored as avro tblproperties
('avro.schema.url'='/tmp/test_schema1.avsc');
[beeline] alter table t1 set
tblproperties('avro.schema.url'='/tmp/test_schema2.avsc');
[beeline] insert into t1 values ('n1', 'l1');
[beeline] create external table t2 stored as avro tblproperties
('avro.schema.url'='/tmp/test_schema2.avsc');
[beeline] insert into t2 values ('n2', 'l2');
[beeline] insert overwrite table t1 select * from t2; {code}
Error:
{code:java}
MetaException(message:Column last_name doesn't exist in table t1 in database
default)
at
org.apache.hadoop.hive.metastore.ObjectStore.validateTableCols(ObjectStore.java:8652)
at
org.apache.hadoop.hive.metastore.ObjectStore.getMTableColumnStatistics(ObjectStore.java:8602)
at
org.apache.hadoop.hive.metastore.ObjectStore.getPartitionColStats(ObjectStore.java:8416)
at
org.apache.hadoop.hive.metastore.ObjectStore.updateTableColumnStatistics(ObjectStore.java:8446
{code}
Running an ALTER UPDATE COLUMNS fixes the problem.
cc: [~szita]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)