Re: Discussion: permanent UDF with database name

2015-12-17 Thread jipengz...@meilishuo.com
@ Furcy Pin
I agree you idea!
when i found after hive-0.13,user can define permanent UDF.but it must bind 
with database name.
so if we want to use the udf without database name,we must create it at all of 
the databases name.
it take another problem,when we create a new databases.we need get all of the 
udfs that we have been defined.
then create them one by one.
This is the biggest problem I have encountered in the use of.

jipengzeng



 
From: Furcy Pin
Date: 2015-12-17 20:14
To: user
Subject: Discussion: permanent UDF with database name
Hi Hive users,

I would like to pursue the discussion that happened during the design of the 
feature:
https://issues.apache.org/jira/browse/HIVE-6167

Some concern where raised back then, and I think that maybe now that it has 
been implemented, some user feedbacks could bring water to the mill.

Even if I understand the utility of grouping UDFs inside databases, I find it 
really annoying not to be able to define my UDFs globally.

For me, one of the main interests of UDFs is to extend the built-in Hive 
functions with the company's user-defined functions, either because some useful 
generic function are missing in the built-in functions or to add 
business-specific functions.

In the latter case, I understand very well the necessity of qualifying them 
with a business-specific database name. But in the former case?


Let's take an example:
It happened several times that we needed a Hive UDF that was did not exist yet 
on the Hive version that we were currently running. To use it, all we had to do 
was take the UDF's source code from a more recent version of Hive, built it in 
a JAR, and add the UDF manually.

When we upgraded, we only add to remove our UDF since it was now built-in.

(To be more specific it happened with collect_list prior to Hive 0.13).

With HIVE-6167, this became impossible, since we ought to create a 
"database_name.function_name", and use it as is. Hence, when upgrading we need 
to rename everywhere "database_name.function_name" with "function_name".

This is just an example, but I would like to emphasize the point that sometimes 
we want to create permanent UDFs that are as global as built-in UDFs and not 
bother if it is a built-in or user-defined function. As someone pointed out in 
HIVE-6167's discussion, imagine if all the built-in UDFs had to be called with 
"sys.function_name".

I would just like to have other Hive user's feedback on that matter.

Did anyone else had similar issues with this behavior? How did you treat them?

Maybe it would make sense to create a feature request for being able to specify 
a GLOBAL keyword when creating a permanent UDF, when we really want it to be 
global?

What do you think?

Regards,

Furcy



Discussion: permanent UDF with database name

2015-12-17 Thread Furcy Pin
Hi Hive users,

I would like to pursue the discussion that happened during the design of
the feature:
https://issues.apache.org/jira/browse/HIVE-6167

Some concern where raised back then, and I think that maybe now that it has
been implemented, some user feedbacks could bring water to the mill.

Even if I understand the utility of grouping UDFs inside databases, I find
it really annoying not to be able to define my UDFs globally.

For me, one of the main interests of UDFs is to extend the built-in Hive
functions with the company's user-defined functions, either because some
useful generic function are missing in the built-in functions or to add
business-specific functions.

In the latter case, I understand very well the necessity of qualifying them
with a business-specific database name. But in the former case?


Let's take an example:
It happened several times that we needed a Hive UDF that was did not exist
yet on the Hive version that we were currently running. To use it, all we
had to do was take the UDF's source code from a more recent version of
Hive, built it in a JAR, and add the UDF manually.

When we upgraded, we only add to remove our UDF since it was now built-in.

(To be more specific it happened with collect_list prior to Hive 0.13).

With HIVE-6167, this became impossible, since we ought to create a
"database_name.function_name", and use it as is. Hence, when upgrading we
need to rename everywhere "database_name.function_name" with
"function_name".

This is just an example, but I would like to emphasize the point that
sometimes we want to create permanent UDFs that are as global as built-in
UDFs and not bother if it is a built-in or user-defined function. As
someone pointed out in HIVE-6167's discussion, imagine if all the built-in
UDFs had to be called with "sys.function_name".

I would just like to have other Hive user's feedback on that matter.

Did anyone else had similar issues with this behavior? How did you treat
them?

Maybe it would make sense to create a feature request for being able to
specify a GLOBAL keyword when creating a permanent UDF, when we really want
it to be global?

What do you think?

Regards,

Furcy