Hi François,

Config:
tmp: "/a_drillbit/local/tmp" -> "/tmp"
I suggest you set tmp config just "/tmp", with your configuration it will
be hdfs directory that is used only for storing your jars during
registration, it's not local directory and for drill internal use only.
local: "/udf" -> "/a_drillbit/local/udf"
It's actually your local udf directory, if you didn't override
$DRILL_TMP_DIR (which defaults to /tmp), you'll be able to find it under
"/tmp/a_drillbit/local/udf"

Jars:
When creating you UDF (for example using tutorial [1]), you'll use maven
plugin which will generate two jars: binary and source.
These two jars should be copied to your hdfs staging directory (do not
rename binary jar into source, they have different content).

Command:
Once jars are copied you can execute udf creation command (if jar with the
same has been registered before, you should execute drop command first).
During UDF registration, you need to indicate binary jar name only (the one
without "-sources").
Once registration command was successfully executed (you'll see the list of
registered UDFs), you may use these UDFs in queries.
Please note, you won't see registered jars (binary and source) locally
until lazy-init fires, in your case, just execute query using newly
registered UDFs.
Once the query succeeds, you may check your local udf directory and you'll
see your jars in there. Local udf directory is temporary directory, for
example, when you stop drillbit, it be cleaned up.
When you start drillbit again and issue query using dynamically registered
function, jars will be loaded to your local udf directory again.

Hope this helps to get you started.

Kind regards
Arina


[1] https://drill.apache.org/docs/tutorial-develop-a-simple-function/

On Fri, Dec 16, 2016 at 10:21 PM, François Méthot <fmetho...@gmail.com>
wrote:

> Hi,
>
>   Dynamic UDF is very neat new feature. We are trying to make it work on
> HDFS.
>
>
> we are using a config that looks like this:
>
> drill.exec.udf {
>   retry-attempts: 5,
>   directory : {
>     fs: "hdfs:/ourname.node:8020",
>     root: "/an_hdfs/drill_path/udf",
>     staging: "/staging",
>     registry: "/registry",
>     tmp: "/a_drillbit/local/tmp"
>     local: "/udf"
> }
> }
>
> We drop UDF jar in staging directory on hdfs.
>
> >hadoop fs -copyFromLocal drill_test.udf-1.0.0.jar
> /an_hdfs/drill_path/udf/staging/drill.test_udf.jar
>
> Then in Drill :
>
> CREATE FUNCTION USING JAR 'drill.test_udf.jar';
>
> 1st problem:
>   It returns:
>       Files does not exist:
> /an_hdfs/drill_path/udf/drill.test_udf-sources.jar
>
>
> So we copy the same file again (with added "-sources"):
> >hadoop fs -copyFromLocal drill_test.udf-1.0.0.jar
> /an_hdfs/drill_path/udf/drill.test_udf-sources.jar
>
> So that we have 2 identical file, one has "-sources"
>
>
> Redo the create function:
> CREATE FUNCTION USING JAR 'drill.test_udf.jar';
> This time it works:
>    The following  UDFs in jar drill.test_udf.jar have been registered:
>    [hello_word(VARCHAR_OPTIONAL), hello_world(VACHAR-REQUIRED)]
>
>    ( Note:  if we do CREATE FUNCTION USING JAR
> 'drill.test_udf-sources.jar', it complains it can't find
> 'drill.test_udf-sources-sources.jar')
>
> 2nd problem
>   We are unable to use the UDF,
>   No Match for function signature hello_world(<ANY>)...
>
> Before I debug the function code it self, I would like to make sure that
> the UDF is actually  being seen locally by each drillbit.
>
> Based on the doc for "Local" property:
>    "The relative path concatenated to the Drill temporary directory to
> indicate the local UDF directory. "
>
> I was expecting drill to copy the Dynamic UDF jar from hdfs to a local dir
> of each drill bit in
> /a_drillbit/local/tmp/udf
>
> Is it where it should be, based on our config?
>
> Thanks
>

Reply via email to