[jira] [Commented] (HIVE-11663) Auto load/unload custom udf function for hive cli and hiveserver2
[ https://issues.apache.org/jira/browse/HIVE-11663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14908481#comment-14908481 ] Ashutosh Chauhan commented on HIVE-11663: - Permanent udfs introduced in HIVE-6047 is to serve the usecase which you have outlined. Wondering why using permanent udfs is not sufficient here? > Auto load/unload custom udf function for hive cli and hiveserver2 > - > > Key: HIVE-11663 > URL: https://issues.apache.org/jira/browse/HIVE-11663 > Project: Hive > Issue Type: Improvement > Components: CLI, Configuration >Affects Versions: 0.14.0, 1.0.0, 1.0.1, 1.1.1, 1.2.1 >Reporter: liuzongquan >Assignee: liuzongquan > Labels: features, patch > Attachments: HIVE-11663-2.patch > > Original Estimate: 96h > Time Spent: 96h > Remaining Estimate: 0h > > when adding custom functions used in hiveserver2, the most method is re-build > the hive source code, re-dist and restart hiveserver2. This way will produce > big cost for service user and cluster manager. The best way, in my opinion, > the custom udf should be like a plugin to the hiveserver2 and hive cli, and > users can add and remove at run-time, especially for hiveserver2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11663) Auto load/unload custom udf function for hive cli and hiveserver2
[ https://issues.apache.org/jira/browse/HIVE-11663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14804894#comment-14804894 ] liuzongquan commented on HIVE-11663: Lefty,sorry for late~ I will change it as your suggestion. Thank you so much! > Auto load/unload custom udf function for hive cli and hiveserver2 > - > > Key: HIVE-11663 > URL: https://issues.apache.org/jira/browse/HIVE-11663 > Project: Hive > Issue Type: Improvement > Components: CLI, Configuration >Affects Versions: 0.14.0, 1.0.0, 1.0.1, 1.1.1, 1.2.1 >Reporter: liuzongquan >Assignee: liuzongquan > Labels: features, patch > Attachments: HIVE-11663-2.patch > > Original Estimate: 96h > Time Spent: 96h > Remaining Estimate: 0h > > when adding custom functions used in hiveserver2, the most method is re-build > the hive source code, re-dist and restart hiveserver2. This way will produce > big cost for service user and cluster manager. The best way, in my opinion, > the custom udf should be like a plugin to the hiveserver2 and hive cli, and > users can add and remove at run-time, especially for hiveserver2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11663) Auto load/unload custom udf function for hive cli and hiveserver2
[ https://issues.apache.org/jira/browse/HIVE-11663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14731851#comment-14731851 ] Lefty Leverenz commented on HIVE-11663: --- [~liuzongquan@mls], sorry about the delay. Here's my review of the configuration parameters in HiveConf.java, patch 2. # Why did you remove four comment lines starting with "// Hadoop Configuration Properties"? # Formatting: For hive.server2.customized.udf.jars.path and hive.server2.customized.udf.properties please insert newlines (\n) in the descriptions as shown on lines 324 and 325, start the description of hive.server2.customized.udf.jars.path on a new line, and use standard indenting. # Capitalization: For hive.server2.customized.udf.jars.path please change "The path on hdfs" to "The path on HDFS" and "means the hdfs path" to "means the HDFS path". # Edit: For hive.server2.customized.udf.jars.path please change "When hdfs:// missing" to "When hdfs:// is missing". # Edits: For hive.server2.customized.udf.properties please change "customized udf" to "customized UDF", change "defination" to "definition", change "It's must be under $" to "It must be under the $", and change "The patern is as below." to "The pattern is". A Review Board entry for this would make it easier to review, because comments could be posted on particular lines. > Auto load/unload custom udf function for hive cli and hiveserver2 > - > > Key: HIVE-11663 > URL: https://issues.apache.org/jira/browse/HIVE-11663 > Project: Hive > Issue Type: Improvement > Components: CLI, Configuration >Affects Versions: 0.14.0, 1.0.0, 1.0.1, 1.1.1, 1.2.1 >Reporter: liuzongquan >Assignee: liuzongquan > Labels: features, patch > Attachments: HIVE-11663-2.patch > > Original Estimate: 96h > Time Spent: 96h > Remaining Estimate: 0h > > when adding custom functions used in hiveserver2, the most method is re-build > the hive source code, re-dist and restart hiveserver2. This way will produce > big cost for service user and cluster manager. The best way, in my opinion, > the custom udf should be like a plugin to the hiveserver2 and hive cli, and > users can add and remove at run-time, especially for hiveserver2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11663) Auto load/unload custom udf function for hive cli and hiveserver2
[ https://issues.apache.org/jira/browse/HIVE-11663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14721766#comment-14721766 ] Lefty Leverenz commented on HIVE-11663: --- Thanks for the changes, I'll post more review comments soon. But you need code review more than doc review, and I'm not qualified to review code. Perhaps if you posted the patch on the review board you would get more responses. * [Review Board | https://cwiki.apache.org/confluence/display/Hive/Review+Board] If you have any trouble with the review board, I'll gladly help. (I had trouble the first time, and finally discovered that I had to use a Firefox browser instead of Safari to get it to work.) Auto load/unload custom udf function for hive cli and hiveserver2 - Key: HIVE-11663 URL: https://issues.apache.org/jira/browse/HIVE-11663 Project: Hive Issue Type: Improvement Components: CLI, Configuration Affects Versions: 0.14.0, 1.0.0, 1.0.1, 1.1.1, 1.2.1 Reporter: liuzongquan Assignee: liuzongquan Labels: features, patch Attachments: HIVE-11663-2.patch Original Estimate: 96h Time Spent: 96h Remaining Estimate: 0h when adding custom functions used in hiveserver2, the most method is re-build the hive source code, re-dist and restart hiveserver2. This way will produce big cost for service user and cluster manager. The best way, in my opinion, the custom udf should be like a plugin to the hiveserver2 and hive cli, and users can add and remove at run-time, especially for hiveserver2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11663) Auto load/unload custom udf function for hive cli and hiveserver2
[ https://issues.apache.org/jira/browse/HIVE-11663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716148#comment-14716148 ] liuzongquan commented on HIVE-11663: I am a newer! Thanks a lot, Lefty! I'll modify the issue! Auto load/unload custom udf function for hive cli and hiveserver2 - Key: HIVE-11663 URL: https://issues.apache.org/jira/browse/HIVE-11663 Project: Hive Issue Type: Improvement Components: CLI, Configuration Affects Versions: 0.14.0, 1.0.0, 1.0.1, 1.1.1, 1.2.1 Reporter: liuzongquan Assignee: liuzongquan Labels: features, patch Attachments: HIVE-11663.patch Original Estimate: 96h Remaining Estimate: 96h when adding custom functions used in hiveserver2, the most method is re-build the hive source code, re-dist and restart hiveserver2. This way will produce big cost for service user and cluster manager. The best way, in my opinion, the custom udf should be like a plugin to the hiveserver2 and hive cli, and users should be add and remove at run-time, especially for hiveserver2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11663) Auto load/unload custom udf function for hive cli and hiveserver2
[ https://issues.apache.org/jira/browse/HIVE-11663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716150#comment-14716150 ] liuzongquan commented on HIVE-11663: I am a newer! Thanks a lot, Lefty! I'll reopen the issue! Auto load/unload custom udf function for hive cli and hiveserver2 - Key: HIVE-11663 URL: https://issues.apache.org/jira/browse/HIVE-11663 Project: Hive Issue Type: Improvement Components: CLI, Configuration Affects Versions: 0.14.0, 1.0.0, 1.0.1, 1.1.1, 1.2.1 Reporter: liuzongquan Assignee: liuzongquan Labels: features, patch Attachments: HIVE-11663.patch Original Estimate: 96h Remaining Estimate: 96h when adding custom functions used in hiveserver2, the most method is re-build the hive source code, re-dist and restart hiveserver2. This way will produce big cost for service user and cluster manager. The best way, in my opinion, the custom udf should be like a plugin to the hiveserver2 and hive cli, and users should be add and remove at run-time, especially for hiveserver2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11663) Auto load/unload custom udf function for hive cli and hiveserver2
[ https://issues.apache.org/jira/browse/HIVE-11663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716149#comment-14716149 ] liuzongquan commented on HIVE-11663: I am a newer! Thanks a lot, Lefty! I'll reopen the issue! Auto load/unload custom udf function for hive cli and hiveserver2 - Key: HIVE-11663 URL: https://issues.apache.org/jira/browse/HIVE-11663 Project: Hive Issue Type: Improvement Components: CLI, Configuration Affects Versions: 0.14.0, 1.0.0, 1.0.1, 1.1.1, 1.2.1 Reporter: liuzongquan Assignee: liuzongquan Labels: features, patch Attachments: HIVE-11663.patch Original Estimate: 96h Remaining Estimate: 96h when adding custom functions used in hiveserver2, the most method is re-build the hive source code, re-dist and restart hiveserver2. This way will produce big cost for service user and cluster manager. The best way, in my opinion, the custom udf should be like a plugin to the hiveserver2 and hive cli, and users should be add and remove at run-time, especially for hiveserver2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11663) Auto load/unload custom udf function for hive cli and hiveserver2
[ https://issues.apache.org/jira/browse/HIVE-11663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716140#comment-14716140 ] Lefty Leverenz commented on HIVE-11663: --- This issue should not be marked Resolved until the patch has been reviewed, tested, and committed. Please see How To Contribute: * [How To Contribute -- Review Process | https://cwiki.apache.org/confluence/display/Hive/HowToContribute#HowToContribute-ReviewProcess] * [How To Contribute -- Contributing Your Work | https://cwiki.apache.org/confluence/display/Hive/HowToContribute#HowToContribute-ContributingYourWork] Auto load/unload custom udf function for hive cli and hiveserver2 - Key: HIVE-11663 URL: https://issues.apache.org/jira/browse/HIVE-11663 Project: Hive Issue Type: Improvement Components: CLI, Configuration Affects Versions: 0.14.0, 1.0.0, 1.0.1, 1.1.1, 1.2.1 Reporter: liuzongquan Assignee: liuzongquan Labels: features, patch Attachments: HIVE-11663.patch Original Estimate: 96h Remaining Estimate: 96h when adding custom functions used in hiveserver2, the most method is re-build the hive source code, re-dist and restart hiveserver2. This way will produce big cost for service user and cluster manager. The best way, in my opinion, the custom udf should be like a plugin to the hiveserver2 and hive cli, and users should be add and remove at run-time, especially for hiveserver2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11663) Auto load/unload custom udf function for hive cli and hiveserver2
[ https://issues.apache.org/jira/browse/HIVE-11663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716190#comment-14716190 ] Lefty Leverenz commented on HIVE-11663: --- Review comments for the first patch: # conf/hive-default.xml.template is now generated from HiveConf.java (see the WARNING!!! lines in the patch) so it doesn't belong in the patch # many or perhaps all files appear twice in the patch # new configuration parameters in HiveConf.java need editing: #* for *hive.server2.customized.udf.enabled* please change Whether enable hiveserver add customized udf from adding jars ... to Whether to enable HiveServer2 or the Hive CLI to add a customized UDF from a jar file ... #* for *hive.server2.customized.udf.jars.path* please supply a parameter description (even though it may seem obvious) #* for *hive.server2.customized.udf.properties* change The customized udf functions' defination file. It's must be under $HIVE_HOME/conf directory. The patern is as below. to The customized UDF functions' definition file. It must be under the $HIVE_HOME/conf directory. The pattern is as below. #* for *hive.server2.customized.udf.defination.list* change defination to definition and for HIVE_SERVER2_CUSTOMIZED_UDF_DEFINATION_LIST similarly change DEFINATION to DEFINITION (also change all occurrences elsewhere in the patch -- I count 26 but that includes the template file and duplicate files) #* for the description of *hive.server2.customized.udf.defination.list* presumably the hive.thrift.customized.* parameters are supposed to be hive.server2.customized.* and other edits are needed: {code} + The customized UDF functions' definition list. For example, if there is a UDF jar on the \n + + ${hive.server2.customized.udf.jars.path} containing a foo.bar class which defines a UDF \n + + function named test, then you can add \'foo.bar:test\' to the property \n + + hive.server2.customized.udf.definition.list. Any additional definitions are separated with commas.\n + + Note that this property depends on hive.server2.customized.udf.enabled being set to true. {code} Auto load/unload custom udf function for hive cli and hiveserver2 - Key: HIVE-11663 URL: https://issues.apache.org/jira/browse/HIVE-11663 Project: Hive Issue Type: Improvement Components: CLI, Configuration Affects Versions: 0.14.0, 1.0.0, 1.0.1, 1.1.1, 1.2.1 Reporter: liuzongquan Assignee: liuzongquan Labels: features, patch Attachments: HIVE-11663.patch Original Estimate: 96h Remaining Estimate: 96h when adding custom functions used in hiveserver2, the most method is re-build the hive source code, re-dist and restart hiveserver2. This way will produce big cost for service user and cluster manager. The best way, in my opinion, the custom udf should be like a plugin to the hiveserver2 and hive cli, and users should be add and remove at run-time, especially for hiveserver2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11663) Auto load/unload custom udf function for hive cli and hiveserver2
[ https://issues.apache.org/jira/browse/HIVE-11663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716231#comment-14716231 ] liuzongquan commented on HIVE-11663: I'll modify the patch as your suggest, Thank you so much, Lefty! Auto load/unload custom udf function for hive cli and hiveserver2 - Key: HIVE-11663 URL: https://issues.apache.org/jira/browse/HIVE-11663 Project: Hive Issue Type: Improvement Components: CLI, Configuration Affects Versions: 0.14.0, 1.0.0, 1.0.1, 1.1.1, 1.2.1 Reporter: liuzongquan Assignee: liuzongquan Labels: features, patch Attachments: HIVE-11663.patch Original Estimate: 96h Remaining Estimate: 96h when adding custom functions used in hiveserver2, the most method is re-build the hive source code, re-dist and restart hiveserver2. This way will produce big cost for service user and cluster manager. The best way, in my opinion, the custom udf should be like a plugin to the hiveserver2 and hive cli, and users should be add and remove at run-time, especially for hiveserver2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11663) Auto load/unload custom udf function for hive cli and hiveserver2
[ https://issues.apache.org/jira/browse/HIVE-11663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716072#comment-14716072 ] liuzongquan commented on HIVE-11663: The implementation is improve CLIService.java and CliDriver.java. When connecting hive throw thrift, CLIService will call function initCustomUDF(SessionHandle sessionHandle), this will load all functions configged in hive-udf.properties, which is a custom defined properties file underlying $HIVE_HOME/conf. Actully, hive-udf.properties can be renamed in hive-site.conf, which set all configuration for this feature, as below. property namehive.server2.customized.udf.enabled/name valuefalse/value descriptionWhether enable hiveserver add customized udf from adding jars, default value is false/description /property property namehive.server2.customized.udf.jars.path/name value/ description/ /property property namehive.server2.customized.udf.properties/name valuehive-udf.properties/value descriptionThe customized udf functions' defination file. It's must be under $HIVE_HOME/conf directory. The patern is as below.func_name={udf_jar_name,udf_class_name}/description /property Auto load/unload custom udf function for hive cli and hiveserver2 - Key: HIVE-11663 URL: https://issues.apache.org/jira/browse/HIVE-11663 Project: Hive Issue Type: Improvement Components: CLI, Configuration Affects Versions: 0.14.0, 1.0.0, 1.0.1, 1.1.1, 1.2.1 Reporter: liuzongquan Assignee: liuzongquan Labels: features, patch Attachments: HIVE-11663.patch Original Estimate: 96h Remaining Estimate: 96h when adding custom functions used in hiveserver2, the most method is re-build the hive source code, re-dist and restart hiveserver2. This way will produce big cost for service user and cluster manager. The best way, in my opinion, the custom udf should be like a plugin to the hiveserver2 and hive cli, and users should be add and remove at run-time, especially for hiveserver2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11663) Auto load/unload custom udf function for hive cli and hiveserver2
[ https://issues.apache.org/jira/browse/HIVE-11663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716080#comment-14716080 ] liuzongquan commented on HIVE-11663: {code:xml} # mls udf multi_concat= {com.meilishuo.hive.udf.jar,com.meilishuo.hive.udf.UDFMultiConcat} unfold={com.meilishuo.hive.udf.jar,com.meilishuo.hive.udf.UDFUnfoldURL} ip2long={com.meilishuo.hive.udf.jar,com.meilishuo.hive.udf.UDFIp2Long} {code} Auto load/unload custom udf function for hive cli and hiveserver2 - Key: HIVE-11663 URL: https://issues.apache.org/jira/browse/HIVE-11663 Project: Hive Issue Type: Improvement Components: CLI, Configuration Affects Versions: 0.14.0, 1.0.0, 1.0.1, 1.1.1, 1.2.1 Reporter: liuzongquan Assignee: liuzongquan Labels: features, patch Attachments: HIVE-11663.patch Original Estimate: 96h Remaining Estimate: 96h when adding custom functions used in hiveserver2, the most method is re-build the hive source code, re-dist and restart hiveserver2. This way will produce big cost for service user and cluster manager. The best way, in my opinion, the custom udf should be like a plugin to the hiveserver2 and hive cli, and users should be add and remove at run-time, especially for hiveserver2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11663) Auto load/unload custom udf function for hive cli and hiveserver2
[ https://issues.apache.org/jira/browse/HIVE-11663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716081#comment-14716081 ] liuzongquan commented on HIVE-11663: {code:xml} ## hive-site.xml property namehive.server2.customized.udf.enabled/name valuefalse/value descriptionWhether enable hiveserver add customized udf from adding jars, default value is false/description /property property namehive.server2.customized.udf.jars.path/name value/ description/ /property property namehive.server2.customized.udf.properties/name valuehive-udf.properties/value descriptionThe customized udf functions' defination file. It's must be under $HIVE_HOME/conf directory. The patern is as below.func_name={udf_jar_name,udf_class_name} /description /property {code} Auto load/unload custom udf function for hive cli and hiveserver2 - Key: HIVE-11663 URL: https://issues.apache.org/jira/browse/HIVE-11663 Project: Hive Issue Type: Improvement Components: CLI, Configuration Affects Versions: 0.14.0, 1.0.0, 1.0.1, 1.1.1, 1.2.1 Reporter: liuzongquan Assignee: liuzongquan Labels: features, patch Attachments: HIVE-11663.patch Original Estimate: 96h Remaining Estimate: 96h when adding custom functions used in hiveserver2, the most method is re-build the hive source code, re-dist and restart hiveserver2. This way will produce big cost for service user and cluster manager. The best way, in my opinion, the custom udf should be like a plugin to the hiveserver2 and hive cli, and users should be add and remove at run-time, especially for hiveserver2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11663) Auto load/unload custom udf function for hive cli and hiveserver2
[ https://issues.apache.org/jira/browse/HIVE-11663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716077#comment-14716077 ] liuzongquan commented on HIVE-11663: the right style for hive-udf.properties: # mls udf multi_concat= {com.meilishuo.hive.udf.jar,com.meilishuo.hive.udf.UDFMultiConcat} unfold={com.meilishuo.hive.udf.jar,com.meilishuo.hive.udf.UDFUnfoldURL} ip2long={com.meilishuo.hive.udf.jar,com.meilishuo.hive.udf.UDFIp2Long} Auto load/unload custom udf function for hive cli and hiveserver2 - Key: HIVE-11663 URL: https://issues.apache.org/jira/browse/HIVE-11663 Project: Hive Issue Type: Improvement Components: CLI, Configuration Affects Versions: 0.14.0, 1.0.0, 1.0.1, 1.1.1, 1.2.1 Reporter: liuzongquan Assignee: liuzongquan Labels: features, patch Attachments: HIVE-11663.patch Original Estimate: 96h Remaining Estimate: 96h when adding custom functions used in hiveserver2, the most method is re-build the hive source code, re-dist and restart hiveserver2. This way will produce big cost for service user and cluster manager. The best way, in my opinion, the custom udf should be like a plugin to the hiveserver2 and hive cli, and users should be add and remove at run-time, especially for hiveserver2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11663) Auto load/unload custom udf function for hive cli and hiveserver2
[ https://issues.apache.org/jira/browse/HIVE-11663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716076#comment-14716076 ] liuzongquan commented on HIVE-11663: when add a custom udf that want to be used in hiveserver2 and hive cli, you just upload the UDF jar file on to ${hive.server2.customized.udf.jar.path}, which is a hdfs:// path. And then, append the udf properties to hive-udf.properties as following. # mls udf multi_concat={com.meilishuo.hive.udf.jar,com.meilishuo.hive.udf.UDFMultiConcat} unfold={com.meilishuo.hive.udf.jar,com.meilishuo.hive.udf.UDFUnfoldURL} ip2long={com.meilishuo.hive.udf.jar,com.meilishuo.hive.udf.UDFIp2Long} That's all you do for new custom UDF! And the hiveserver2 can use them immediately! Auto load/unload custom udf function for hive cli and hiveserver2 - Key: HIVE-11663 URL: https://issues.apache.org/jira/browse/HIVE-11663 Project: Hive Issue Type: Improvement Components: CLI, Configuration Affects Versions: 0.14.0, 1.0.0, 1.0.1, 1.1.1, 1.2.1 Reporter: liuzongquan Assignee: liuzongquan Labels: features, patch Attachments: HIVE-11663.patch Original Estimate: 96h Remaining Estimate: 96h when adding custom functions used in hiveserver2, the most method is re-build the hive source code, re-dist and restart hiveserver2. This way will produce big cost for service user and cluster manager. The best way, in my opinion, the custom udf should be like a plugin to the hiveserver2 and hive cli, and users should be add and remove at run-time, especially for hiveserver2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)