Ratandeep Ratti created HIVE-19256:
--------------------------------------

             Summary: UDF which shapes the input data according to the 
specified schema
                 Key: HIVE-19256
                 URL: https://issues.apache.org/jira/browse/HIVE-19256
             Project: Hive
          Issue Type: New Feature
            Reporter: Ratandeep Ratti
            Assignee: Ratandeep Ratti


We use this UDF a lot in our org. This UDF takes an object and a Hive schema 
and make sure the output object matches the schema completely. In some respects 
it is similar to {{named
_struct}} UDF which can be used to select columns from a struct, but it is more 
general since it can work not only on structs, but all Hive data types (expect 
union). Also the schema can provide certain valid type conversions (int -> 
double etc)

One scenario where this is quite useful is making sure that the Hive view 
created with a specific schema will have columns which will always match that 
schema. In Hive today when a view is created, new nested columns from the 
underlying table can leak out from the view, even though the user never wanted 
this behavior. Note that this leaking of columns is only for nested columns and 
not for top level columns, so in that regard this behavior of Hive is 
inconsistent.

Sample usage of the UDF
{code}
generic_project(col, "struct<a:array<struct<c:int,d:string>>>") // Returning 
data which matches the input schema. Here extra columns which are not part of 
the input will be removed

generic_project(col, "struct<a:double>") //  If the input column had a struct 
with col a as int . It would type cast 'a' to double.
{code}





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to