[jira] [Commented] (HIVE-6167) Allow user-defined functions to be qualified with database name
[ https://issues.apache.org/jira/browse/HIVE-6167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13925126#comment-13925126 ] Lefty Leverenz commented on HIVE-6167: -- [~jdere] documented this in the wiki's DDL doc, so I added a brief description in the Hive Plugins doc and linked it to the DDL. * [DDL: Permanent Functions |https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-PermanentFunctions] * [Hive Plugins |https://cwiki.apache.org/confluence/display/Hive/HivePlugins] But the jira description currently has things backwards: "This task would allow users to define temporary UDFs (and eventually permanent UDFs) with a database name" -- it's permanent UDFs that can be defined with a database name, not temporary UDFs. > Allow user-defined functions to be qualified with database name > --- > > Key: HIVE-6167 > URL: https://issues.apache.org/jira/browse/HIVE-6167 > Project: Hive > Issue Type: Sub-task > Components: UDF >Reporter: Jason Dere >Assignee: Jason Dere > Fix For: 0.13.0 > > Attachments: HIVE-6167.1.patch, HIVE-6167.2.patch, HIVE-6167.3.patch, > HIVE-6167.4.patch > > > Function names in Hive are currently unqualified and there is a single > namespace for all function names. This task would allow users to define > temporary UDFs (and eventually permanent UDFs) with a database name, such as: > CREATE TEMPORARY FUNCTION userdb.myfunc 'myudfclass'; -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6167) Allow user-defined functions to be qualified with database name
[ https://issues.apache.org/jira/browse/HIVE-6167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13899510#comment-13899510 ] Ashutosh Chauhan commented on HIVE-6167: OK, if need be that could be done in a follow-up +1 > Allow user-defined functions to be qualified with database name > --- > > Key: HIVE-6167 > URL: https://issues.apache.org/jira/browse/HIVE-6167 > Project: Hive > Issue Type: Sub-task > Components: UDF >Reporter: Jason Dere >Assignee: Jason Dere > Attachments: HIVE-6167.1.patch, HIVE-6167.2.patch, HIVE-6167.3.patch, > HIVE-6167.4.patch > > > Function names in Hive are currently unqualified and there is a single > namespace for all function names. This task would allow users to define > temporary UDFs (and eventually permanent UDFs) with a database name, such as: > CREATE TEMPORARY FUNCTION userdb.myfunc 'myudfclass'; -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6167) Allow user-defined functions to be qualified with database name
[ https://issues.apache.org/jira/browse/HIVE-6167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13899383#comment-13899383 ] Jason Dere commented on HIVE-6167: -- So DOT isn't normally allowed as part of an identifier. I think this would only be possible if the dot was included as part of a quoted name, which seems like a bit of an unusual case. If this case would need to be supported this may be better off being done as a separate issue. > Allow user-defined functions to be qualified with database name > --- > > Key: HIVE-6167 > URL: https://issues.apache.org/jira/browse/HIVE-6167 > Project: Hive > Issue Type: Sub-task > Components: UDF >Reporter: Jason Dere >Assignee: Jason Dere > Attachments: HIVE-6167.1.patch, HIVE-6167.2.patch, HIVE-6167.3.patch, > HIVE-6167.4.patch > > > Function names in Hive are currently unqualified and there is a single > namespace for all function names. This task would allow users to define > temporary UDFs (and eventually permanent UDFs) with a database name, such as: > CREATE TEMPORARY FUNCTION userdb.myfunc 'myudfclass'; -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6167) Allow user-defined functions to be qualified with database name
[ https://issues.apache.org/jira/browse/HIVE-6167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13898552#comment-13898552 ] Ashutosh Chauhan commented on HIVE-6167: Patch looks good. I have one question tho. Earlier, function names were identifier. I assume DOT can be part of an identifier. If so, than can it be a problem now if some one who has earlier created a function name with dot in it, may now confuse Hive. > Allow user-defined functions to be qualified with database name > --- > > Key: HIVE-6167 > URL: https://issues.apache.org/jira/browse/HIVE-6167 > Project: Hive > Issue Type: Sub-task > Components: UDF >Reporter: Jason Dere >Assignee: Jason Dere > Attachments: HIVE-6167.1.patch, HIVE-6167.2.patch, HIVE-6167.3.patch, > HIVE-6167.4.patch > > > Function names in Hive are currently unqualified and there is a single > namespace for all function names. This task would allow users to define > temporary UDFs (and eventually permanent UDFs) with a database name, such as: > CREATE TEMPORARY FUNCTION userdb.myfunc 'myudfclass'; -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6167) Allow user-defined functions to be qualified with database name
[ https://issues.apache.org/jira/browse/HIVE-6167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13892491#comment-13892491 ] Jason Dere commented on HIVE-6167: -- I don't think the 1 failure is related to these changes, bucket_num_reducers has failed on a number of precommit tests. > Allow user-defined functions to be qualified with database name > --- > > Key: HIVE-6167 > URL: https://issues.apache.org/jira/browse/HIVE-6167 > Project: Hive > Issue Type: Sub-task > Components: UDF >Reporter: Jason Dere >Assignee: Jason Dere > Attachments: HIVE-6167.1.patch, HIVE-6167.2.patch, HIVE-6167.3.patch, > HIVE-6167.4.patch > > > Function names in Hive are currently unqualified and there is a single > namespace for all function names. This task would allow users to define > temporary UDFs (and eventually permanent UDFs) with a database name, such as: > CREATE TEMPORARY FUNCTION userdb.myfunc 'myudfclass'; -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6167) Allow user-defined functions to be qualified with database name
[ https://issues.apache.org/jira/browse/HIVE-6167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891686#comment-13891686 ] Hive QA commented on HIVE-6167: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12626955/HIVE-6167.4.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 5012 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucket_num_reducers {noformat} Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1189/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1189/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12626955 > Allow user-defined functions to be qualified with database name > --- > > Key: HIVE-6167 > URL: https://issues.apache.org/jira/browse/HIVE-6167 > Project: Hive > Issue Type: Sub-task > Components: UDF >Reporter: Jason Dere >Assignee: Jason Dere > Attachments: HIVE-6167.1.patch, HIVE-6167.2.patch, HIVE-6167.3.patch, > HIVE-6167.4.patch > > > Function names in Hive are currently unqualified and there is a single > namespace for all function names. This task would allow users to define > temporary UDFs (and eventually permanent UDFs) with a database name, such as: > CREATE TEMPORARY FUNCTION userdb.myfunc 'myudfclass'; -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6167) Allow user-defined functions to be qualified with database name
[ https://issues.apache.org/jira/browse/HIVE-6167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13885191#comment-13885191 ] Jason Dere commented on HIVE-6167: -- https://reviews.apache.org/r/17493/ > Allow user-defined functions to be qualified with database name > --- > > Key: HIVE-6167 > URL: https://issues.apache.org/jira/browse/HIVE-6167 > Project: Hive > Issue Type: Sub-task > Components: UDF >Reporter: Jason Dere >Assignee: Jason Dere > Attachments: HIVE-6167.1.patch, HIVE-6167.2.patch > > > Function names in Hive are currently unqualified and there is a single > namespace for all function names. This task would allow users to define > temporary UDFs (and eventually permanent UDFs) with a database name, such as: > CREATE TEMPORARY FUNCTION userdb.myfunc 'myudfclass'; > NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6167) Allow user-defined functions to be qualified with database name
[ https://issues.apache.org/jira/browse/HIVE-6167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873468#comment-13873468 ] Hive QA commented on HIVE-6167: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12623290/HIVE-6167.1.patch {color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 4929 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_show_describe_func_quotes org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_show_functions org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_clusterbyorderby org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_join_alt_syntax_comma_on org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_lateral_view_join org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_uniquejoin3 org.apache.hadoop.hive.ql.parse.TestMacroSemanticAnalyzer.testCannotUseReservedWordAsName org.apache.hadoop.hive.ql.parse.TestParseNegative.testParseNegative_macro_reserved_word {noformat} Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/932/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/932/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 8 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12623290 > Allow user-defined functions to be qualified with database name > --- > > Key: HIVE-6167 > URL: https://issues.apache.org/jira/browse/HIVE-6167 > Project: Hive > Issue Type: Sub-task > Components: UDF >Reporter: Jason Dere >Assignee: Jason Dere > Attachments: HIVE-6167.1.patch > > > Function names in Hive are currently unqualified and there is a single > namespace for all function names. This task would allow users to define > temporary UDFs (and eventually permanent UDFs) with a database name, such as: > CREATE TEMPORARY FUNCTION userdb.myfunc 'myudfclass'; -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6167) Allow user-defined functions to be qualified with database name
[ https://issues.apache.org/jira/browse/HIVE-6167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13868508#comment-13868508 ] Jason Dere commented on HIVE-6167: -- The current syntax and usage (for temporary functions) will continue to work as they do now. The new syntax additions and behavior (for "permanent" functions registered in the metastore) is what would have extra name qualifiers. > Allow user-defined functions to be qualified with database name > --- > > Key: HIVE-6167 > URL: https://issues.apache.org/jira/browse/HIVE-6167 > Project: Hive > Issue Type: Sub-task > Components: UDF >Reporter: Jason Dere >Assignee: Jason Dere > > Function names in Hive are currently unqualified and there is a single > namespace for all function names. This task would allow users to define > temporary UDFs (and eventually permanent UDFs) with a database name, such as: > CREATE TEMPORARY FUNCTION userdb.myfunc 'myudfclass'; -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6167) Allow user-defined functions to be qualified with database name
[ https://issues.apache.org/jira/browse/HIVE-6167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13868485#comment-13868485 ] Edward Capriolo commented on HIVE-6167: --- In my opinion we must keep the current syntax working as is. Current users of hive do not want there scripts to break just to match a standard. If we wish to add new syntax that matches a given standard that makes sense. I do not think the current standard forbids keeping our current syntax and functionality. Also realistically we have to be practical. Users have sessions, most users are not going to care what database/schema a function is associated with. Most are going to want global functions. Most people are not going to have so many functions that a conflict would ever arise. Lets not make and solve problems we really don't have. > Allow user-defined functions to be qualified with database name > --- > > Key: HIVE-6167 > URL: https://issues.apache.org/jira/browse/HIVE-6167 > Project: Hive > Issue Type: Sub-task > Components: UDF >Reporter: Jason Dere >Assignee: Jason Dere > > Function names in Hive are currently unqualified and there is a single > namespace for all function names. This task would allow users to define > temporary UDFs (and eventually permanent UDFs) with a database name, such as: > CREATE TEMPORARY FUNCTION userdb.myfunc 'myudfclass'; -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6167) Allow user-defined functions to be qualified with database name
[ https://issues.apache.org/jira/browse/HIVE-6167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13868476#comment-13868476 ] Alan Gates commented on HIVE-6167: -- There's no need to add it to every database. Users can invoked the function across database. So it would be {code} use DB1; create function a(); use DB2; select DB1.a() from T; {code} I disagree that this is a usability issue. I think it's much more usable to give users namespaces then to push them into one flat namespace. > Allow user-defined functions to be qualified with database name > --- > > Key: HIVE-6167 > URL: https://issues.apache.org/jira/browse/HIVE-6167 > Project: Hive > Issue Type: Sub-task > Components: UDF >Reporter: Jason Dere >Assignee: Jason Dere > > Function names in Hive are currently unqualified and there is a single > namespace for all function names. This task would allow users to define > temporary UDFs (and eventually permanent UDFs) with a database name, such as: > CREATE TEMPORARY FUNCTION userdb.myfunc 'myudfclass'; -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6167) Allow user-defined functions to be qualified with database name
[ https://issues.apache.org/jira/browse/HIVE-6167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13868372#comment-13868372 ] Xuefu Zhang commented on HIVE-6167: --- I largely agree except for one thought on #2. If a user adds a perm function without db name, could we assume it's global. Image an admin register a particular function across all DBs. If Hive insists the function go to a DB, then he/she will have to do the same for each DB: {code} use DB1; create function a(); use Db2; create function a(); ... {code} or {code} create function DB1.a(); create function DB1.a(); ... {code} It becomes much simpler if a function without DB name goes global (in line with temp functions). Is this completely anti SQL standard? Even so, It seems that usability outweighs, not to mention that Hive function is non-standard anyway. > Allow user-defined functions to be qualified with database name > --- > > Key: HIVE-6167 > URL: https://issues.apache.org/jira/browse/HIVE-6167 > Project: Hive > Issue Type: Sub-task > Components: UDF >Reporter: Jason Dere >Assignee: Jason Dere > > Function names in Hive are currently unqualified and there is a single > namespace for all function names. This task would allow users to define > temporary UDFs (and eventually permanent UDFs) with a database name, such as: > CREATE TEMPORARY FUNCTION userdb.myfunc 'myudfclass'; -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6167) Allow user-defined functions to be qualified with database name
[ https://issues.apache.org/jira/browse/HIVE-6167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13868339#comment-13868339 ] Jason Dere commented on HIVE-6167: -- I could get on board with that approach. Xuefu, does that work for you? > Allow user-defined functions to be qualified with database name > --- > > Key: HIVE-6167 > URL: https://issues.apache.org/jira/browse/HIVE-6167 > Project: Hive > Issue Type: Sub-task > Components: UDF >Reporter: Jason Dere >Assignee: Jason Dere > > Function names in Hive are currently unqualified and there is a single > namespace for all function names. This task would allow users to define > temporary UDFs (and eventually permanent UDFs) with a database name, such as: > CREATE TEMPORARY FUNCTION userdb.myfunc 'myudfclass'; -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6167) Allow user-defined functions to be qualified with database name
[ https://issues.apache.org/jira/browse/HIVE-6167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13868145#comment-13868145 ] Alan Gates commented on HIVE-6167: -- My thoughts: # I agree that built-in UDFs should be in a global space that need not (probably cannot) be qualified. Everyone needs access to them, and I can't see any value in adding a namespace here. # Permanent functions added by users or admins should be in a database. This gives us namespace separation, it fits with the SQL standard, and I think it will fit in better with where we're trying to take the security model. I'm fine with {{create function}} defaulting to the current (not necessarily the default) data base if users don't specify which database. # Temporary functions are really session specific (they go away once the user disconnects) and can't be accessed by any other session, correct? So I think it's fine to view them as "global" for the purposes of that session. The following sequence should work: ## {{use db foo}} ## {{create temporary function a();}} ## {{use db bar}} ## {{select a( x ) from T}} > Allow user-defined functions to be qualified with database name > --- > > Key: HIVE-6167 > URL: https://issues.apache.org/jira/browse/HIVE-6167 > Project: Hive > Issue Type: Sub-task > Components: UDF >Reporter: Jason Dere >Assignee: Jason Dere > > Function names in Hive are currently unqualified and there is a single > namespace for all function names. This task would allow users to define > temporary UDFs (and eventually permanent UDFs) with a database name, such as: > CREATE TEMPORARY FUNCTION userdb.myfunc 'myudfclass'; -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6167) Allow user-defined functions to be qualified with database name
[ https://issues.apache.org/jira/browse/HIVE-6167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13865921#comment-13865921 ] Xuefu Zhang commented on HIVE-6167: --- I think a direct consequence of the proposed behavior is that user who defines a function (temp or perm) without qualified name will not be able to use it as the way the function is defined unless the current database is 'default'. Plus, it seems unusual that in such case that a function defines in the context of a database other than 'default' actually registered with another database ('default'). I can think of two alternatives: 1. break backward compatibility so that non-qualified temp function goes to the current database. 2. don't qualify user defined functions (temp or perm) with database names so that every user-defined function as well as built-in function is global. I'd like to read other's thoughts as well. > Allow user-defined functions to be qualified with database name > --- > > Key: HIVE-6167 > URL: https://issues.apache.org/jira/browse/HIVE-6167 > Project: Hive > Issue Type: Sub-task > Components: UDF >Reporter: Jason Dere >Assignee: Jason Dere > > Function names in Hive are currently unqualified and there is a single > namespace for all function names. This task would allow users to define > temporary UDFs (and eventually permanent UDFs) with a database name, such as: > CREATE TEMPORARY FUNCTION userdb.myfunc 'myudfclass'; -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6167) Allow user-defined functions to be qualified with database name
[ https://issues.apache.org/jira/browse/HIVE-6167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13865845#comment-13865845 ] Jason Dere commented on HIVE-6167: -- [~xuefuz] brought up this point: {quote} Reading the document, I found one thing that seems to be debatable: 1. Creating a function w/o database name means "in the current database of the session". 2. Creating a temp function 2/o database name means global in the system as built-in functions. I understand the consideration of backward compatibility, but the discrepancy can confuse the user a great deal. Why cannot we change #1 in the same way for temp functions? 1'. Creating a function w/o database name means global in the system as built-in functions. {quote} It would make sense for there to be some consistency with the default database name for temporary/permanent functions. I've also been playing with qualifying built-in functions with schema names and I don't really like the effect on "describe functions": {noformat} hive> show functions; OK default.myfunc mydb.myfunc sys.! sys.!= sys.% sys.& sys.* sys.+ sys.- sys./ sys.< sys.<= sys.<=> sys.<> sys.= sys.== sys.> sys.>= sys.^ sys.abs sys.acos sys.and sys.array sys.array_contains sys.ascii ... {noformat} Many of these functions will not work if they are called as they are labelled, such as any of the operators (sys.+ wouldn't work), or any keywords that are implemented as functions. So I'm wondering if the built-in functions can be in the registry as non-qualified function names. However, I think that if we do have permanent functions, that they should be qualified. But we also want consistency in default db name between temp and permanent functions. So how about this behavior: - Built-in functions are not qualified - User-defined functions (temp/permanent) that are created without a database name will be created using the database name "default". - Function resolution of non-qualified functions will be in the following order: 1. Lookup using non-qualified name, which will catch built-ins 2. Lookup by qualifying function name with "default". This will catch non-qualified user-defined functions 3. Lookup by qualifying function name with user's current database. I suppose a future enhancement could be to allow users to specify a custom set of db names when resolving function names, but I think the above would be a suitable default. Does this approach make sense? > Allow user-defined functions to be qualified with database name > --- > > Key: HIVE-6167 > URL: https://issues.apache.org/jira/browse/HIVE-6167 > Project: Hive > Issue Type: Sub-task > Components: UDF >Reporter: Jason Dere >Assignee: Jason Dere > > Function names in Hive are currently unqualified and there is a single > namespace for all function names. This task would allow users to define > temporary UDFs (and eventually permanent UDFs) with a database name, such as: > CREATE TEMPORARY FUNCTION userdb.myfunc 'myudfclass'; -- This message was sent by Atlassian JIRA (v6.1.5#6160)