[jira] [Commented] (PHOENIX-538) Support UDFs

Samarth Jain (JIRA) Thu, 23 Apr 2015 11:09:14 -0700

    [ 
https://issues.apache.org/jira/browse/PHOENIX-538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14509477#comment-14509477
 ]


Samarth Jain commented on PHOENIX-538:
--------------------------------------

Great work, [~rajeshbabu]. This is awesome! I have some initial feedback:

In your test class UserDefinedFunctionsIT, I see that in an @After method you 
are always closing the test driver. Is that intentional? 

{code}
+    @After
+    public void cleanUp() throws Exception {
+        destroyDriver(driver);
+    }
{code}

Also, you are setting the same property on the driver in every test method. 
{code}
+        Properties props = new Properties();
+        props.setProperty(QueryServices.ALLOW_USER_DEFINED_FUNCTIONS_ATTRIB, 
"true");
{code}

If you don't need to create a new driver for every test method, then you can do 
the following in your @BeforeClass method:
{code}
Map<String,String> props = Maps.newHashMapWithExpectedSize(1);
props.put(QueryServices.ALLOW_USER_DEFINED_FUNCTIONS_ATTRIB, "true");
setUpTestDriver(new ReadOnlyProps(props.entrySet().iterator()));
{code}

And the following in you @AfterClass method:
{code}
try {
          destroyDriver(driver)
} finally {
            util.shutdownMiniCluster();
}
{code}

In MetadataEndPointImpl.java, I don't see any callers that are passing non-null 
rowLocks list. You can probably get rid off the method.
{code}
private List<PFunction> doGetFunctions(List<byte[]> keys, long clientTimeStamp, 
List<RowLock> rowLocks) throws IOException, SQLException
{code}

In UDFExpression.java, the locking to make sure config is singleton is broken:

{code}
+    private void lazyInitialize() {
+        synchronized (this) {
+            if(config!=null) return;
+            config = HBaseConfiguration.create();
+        }
+    }
{code}

Different UdfExpression instances will have different "this" pointers. So you 
won't really be locking different threads from doing the initialization 
multiple times. Besides, do you really need to lazy initialize config? When 
class is loaded, you can just initialize config by doing this:
{code}
private static Configuration config = HBaseConfiguration.create();
{code}

In UDfExpression.java, method getClassLoader is it possible for tenantId or 
jarPath to be null? If yes, you probably need some null checks to be added. In 
particular, this will throw an NPE if tenantId is null:

{code}
Lock lock = locker.acquireLock(tenantId.getString());
{code}

Can you help me understand the reasoning behind the warning message:
{code}
+                    // Lost update race, use already added class loader
+                    LOG.warn("THIS SHOULD NOT HAPPEN, a class loader" + " is 
already cached for "
+                            + jarPath);
{code}
Isn't it perfectly acceptable to lose the update race? If yes, then that 
warning message can probably be removed. Also, is creating class loaders 
expensive? Do you want only one thread to create the class loader for a 
tenantId? If yes to both, then you should probably consider using Guava 
MapMaker's ComputingMapAdapter. Using that implementation, only one thread will 
be creating the class loader for a tenantId and other threads will wait till 
the computation is done.




> Support UDFs
> ------------
>
>                 Key: PHOENIX-538
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-538
>             Project: Phoenix
>          Issue Type: Task
>            Reporter: James Taylor
>            Assignee: Rajeshbabu Chintaguntla
>             Fix For: 5.0.0, 4.4.0
>
>         Attachments: PHOENIX-538-wip.patch, PHOENIX-538_v1.patch, 
> PHOENIX-538_v2.patch, PHOENIX-538_v3.patch, PHOENIX-538_v4.patch
>
>
> Phoenix allows built-in functions to be added (as described 
> [here](http://phoenix-hbase.blogspot.com/2013/04/how-to-add-your-own-built-in-function.html))
>  with the restriction that they must be in the phoenix jar. We should improve 
> on this and allow folks to declare new functions through a CREATE FUNCTION 
> command like this:
>       CREATE FUNCTION mdHash(anytype)
>       RETURNS binary(16)
>       LOCATION 'hdfs://path-to-my-jar' 'com.me.MDHashFunction'
> Since HBase supports loading jars dynamically, this would not be too 
> difficult. The function implementation class would be required to extend our 
> ScalarFunction base class. Here's how I could see it being implemented:
> * modify the phoenix grammar to support the new CREATE FUNCTION syntax
> * create a new UTFParseNode class to capture the parse state
> * add a new method to the MetaDataProtocol interface
> * add a new method in ConnectionQueryServices to invoke the MetaDataProtocol 
> method
> * add a new method in MetaDataClient to invoke the ConnectionQueryServices 
> method
> * persist functions in a new "SYSTEM.FUNCTION" table
> * add a new client-side representation to cache functions called PFunction
> * modify ColumnResolver to dynamically resolve a function in the same way we 
> dynamically resolve and load a table
> * create and register a new ExpressionType called UDFExpression
> * at parse time, check for the function name in the built in list first (as 
> is currently done), and if not found in the PFunction cache. If not found 
> there, then use the new UDFExpression as a placeholder and have the 
> ColumnResolver attempt to resolve it at compile time and throw an error if 
> unsuccessful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PHOENIX-538) Support UDFs

Reply via email to