[
https://issues.apache.org/jira/browse/PHOENIX-538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14494793#comment-14494793
]
Rajeshbabu Chintaguntla commented on PHOENIX-538:
-------------------------------------------------
Thank you so much [~jamestaylor] for review.
bq. One thing that's missing (at least I couldn't find it) was how you're going
to handle serialization/deserialization for UDFs. Phoenix today relies on
ExpressionType to do this. Every expression (including built-in functions) is
listed in this ordinal. We transfer only the ordinal over to the server and
then the server uses ExpressionType to construct an empty Expression and
expression.readFields() to populate it. For UDFs, it seems like you'd need to
pass over the class name (and jar?) so that the server would know what class to
instantiate. I think you'd also need the tenant ID to be available from the
BooleanExpressionFilter which is the root of what would instantiate these on
the server side, so that you could get the right class loader.
bq. nother thing I couldn't find (but I suspect maybe you haven't done it
yet?), is the dynamically loading of the jars. For this, given your scoping of
functions to tenant ID, I think you'd want a different class loader per tenant
ID, plus one for the global tenant (i.e. when no tenant ID is specified).
I have handled these in the current patch. Added UDFExpression in which we
write/read tenantid,classname,jar path along with expression. Using this
information for loading the classes using DynamicClassLoader supported by
HBase. The functions jars should place in hdfs directory of
hbase.dynamic.jars.dir configuration value. Even if we create function with jar
in other hdfs location still it will be loaded. As you mentioned having a class
loader for each tenant id and caching them.
bq. Instead, I'd collection all UDFParseNodes at parse time by keeping a list
of these per statement.
This I have not handled James. You mean after parsing the statement we need to
check for all UDFParse nodes in the statement and have list. I am thinking
something like we have Do we need to have some parsenode visitor to check
udfparse nodes. But the parse nodes can be any where like where parse nodes,
select parse nodes, having parse nodes depending on the query. Do you have any
idea how we can collect all together in a statement?
bq. Then, modify MetaDataEndPoint.getTable() to accept a list of these and
resolve them all (use a SkipScanFilter) when the table is resolved.
You mean getfunction method, because all functions stored in SYSTEM.FUNCTION
table but getTable will be executed on catalog table only.
bq. dd the same split policy we have on SYSTEM.CATALOG on SYSTEM.FUNCTION to
ensure that all tables for a given tenant will be in the same region.
I have added this.
bq. Some minor stuff: rename this to resolveFunction():
Done.
bq. Keep all ParseNodes immutable, so no UDFParseNode.setInfo(), but use a copy
constructor instead.
Done.
bq. An alternative which I've seen before is CREATE [OR REPLACE]instead of the
IF NOT EXISTS. What do other RDBMS support?
You are correct James. All other RDBMS support CREATE [OR REPLACE] this. I have
changed accordingly. I will implement replace part while handling alter
functions(may be replace it self should be enough).
bq. Other than that, my only concern is the amount of copy/paste code between
the client/server caching we do for a table and now for a function.
Now in the patch sharing the PTableCache to hold functions as well and changed
the class to PMetaDataCache.
Please review the latest patch. Thanks.
> Support UDFs
> ------------
>
> Key: PHOENIX-538
> URL: https://issues.apache.org/jira/browse/PHOENIX-538
> Project: Phoenix
> Issue Type: Task
> Reporter: James Taylor
> Assignee: Rajeshbabu Chintaguntla
> Fix For: 5.0.0, 4.4.0
>
> Attachments: PHOENIX-538-wip.patch, PHOENIX-538_v1.patch
>
>
> Phoenix allows built-in functions to be added (as described
> [here](http://phoenix-hbase.blogspot.com/2013/04/how-to-add-your-own-built-in-function.html))
> with the restriction that they must be in the phoenix jar. We should improve
> on this and allow folks to declare new functions through a CREATE FUNCTION
> command like this:
> CREATE FUNCTION mdHash(anytype)
> RETURNS binary(16)
> LOCATION 'hdfs://path-to-my-jar' 'com.me.MDHashFunction'
> Since HBase supports loading jars dynamically, this would not be too
> difficult. The function implementation class would be required to extend our
> ScalarFunction base class. Here's how I could see it being implemented:
> * modify the phoenix grammar to support the new CREATE FUNCTION syntax
> * create a new UTFParseNode class to capture the parse state
> * add a new method to the MetaDataProtocol interface
> * add a new method in ConnectionQueryServices to invoke the MetaDataProtocol
> method
> * add a new method in MetaDataClient to invoke the ConnectionQueryServices
> method
> * persist functions in a new "SYSTEM.FUNCTION" table
> * add a new client-side representation to cache functions called PFunction
> * modify ColumnResolver to dynamically resolve a function in the same way we
> dynamically resolve and load a table
> * create and register a new ExpressionType called UDFExpression
> * at parse time, check for the function name in the built in list first (as
> is currently done), and if not found in the PFunction cache. If not found
> there, then use the new UDFExpression as a placeholder and have the
> ColumnResolver attempt to resolve it at compile time and throw an error if
> unsuccessful.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)