So far, my understanding has been that in Hive tables, each partition has a
schema and whenever you add a partition to a Hive table, the current table
schema is copied into the partition schema. This should allow a seamless
evolution of the schema. Recently I came across something that contradicts
this. Hence, looking for some clarification.
So we have a table, and we have fixed ORC format for it. The table has a
schema say (a,b,c,d). We added one partition. The data is stored in the
same order. When we query (a, b) from this partition, the data has the two
columns in the correct order. Now we go ahead and change the schema of the
*table* to (b,c,d,a). But the schema of the partition is still (a,b,c,d) as
verified by doing describe extended on the partition. Now we issue the same
query on the old partition projecting (a,b). Surprisingly, it projects (b,
c). Is this the expected behaviour or am I missing something obvious?
Coming back to the question of schema evolution, as business usecases grow,
there is a need to add fields in the table. So am I restricted by hive to
add my fields at the end only?
Thanks
--
Rajat Khandelwal
Software Engineer
--
_
The information contained in this communication is intended solely for the
use of the individual or entity to whom it is addressed and others
authorized to receive it. It may contain confidential or legally privileged
information. If you are not the intended recipient you are hereby notified
that any disclosure, copying, distribution or taking any action in reliance
on the contents of this information is strictly prohibited and may be
unlawful. If you have received this communication in error, please notify
us immediately by responding to this email and then delete it from your
system. The firm is neither liable for the proper and complete transmission
of the information contained in this communication nor for any delay in its
receipt.