[ https://issues.apache.org/jira/browse/HIVE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Philip Zeyliger updated HIVE-1157: ---------------------------------- Attachment: hive-1157.patch.txt This patch changes SessionState.java to copy jar resources locally, if they're not local already. Because I had to manage additional per-resource state (namely, the location of the local copy, so that it can be cleaned up), I modified the ResourceType enum to be simply an enum, and now there is one ResourceHook object per resource, not per resource type. I changed the container map to be an EnumMap. It turns out that you can't specify an HDFS path to "-libjars", so I had to also modify ExecDriver.java to call a special method when it's getting jar resources. I would appreciate some guidance on how to test this best. So far, I've manually done the following steps: {noformat} create table t (x int); # Create a file with "1\n2\n3\n" as /tmp/a. load data local inpath '/tmp/a' into table t; add jar hdfs://localhost:8020/Test.jar; create temporary function cube as 'org.apache.hive.test.CubeSampleUDF'; # I wrote this select cube(x) from t; {noformat} What else would it be reasonable for me to do? It looks like there's no DFS in the test environment. I might be able to register an ad-hoc file system implementation of some sort or use mockito or some such... What do you recommend? I'm running the existing tests to make sure that I haven't broken anything. These seem to take a while, so I'll report back. > UDFs can't be loaded via "add jar" when jar is on HDFS > ------------------------------------------------------ > > Key: HIVE-1157 > URL: https://issues.apache.org/jira/browse/HIVE-1157 > Project: Hadoop Hive > Issue Type: Improvement > Components: Query Processor > Reporter: Philip Zeyliger > Priority: Minor > Attachments: hive-1157.patch.txt > > > As discussed on the mailing list, it would be nice if you could use UDFs that > are on jars on HDFS. The proposed implementation would be for "add jar" to > recognize that the target file is on HDFS, copy it locally, and load it into > the classpath. > {quote} > Hi folks, > I have a quick question about UDF support in Hive. I'm on the 0.5 branch. > Can you use a UDF where the jar which contains the function is on HDFS, and > not on the local filesystem. Specifically, the following does not seem to > work: > # This is Hive 0.5, from svn > $bin/hive > Hive history file=/tmp/philip/hive_job_log_philip_201002081541_370227273.txt > hive> add jar hdfs://localhost/FooTest.jar; > > Added hdfs://localhost/FooTest.jar to class path > hive> create temporary function cube as 'com.cloudera.FooTestUDF'; > > FAILED: Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.FunctionTask > Does this work for other people? I could probably fix it by changing "add > jar" to download remote jars locally, when necessary (to load them into the > classpath), or update URLClassLoader (or whatever is underneath there) to > read directly from HDFS, which seems a bit more fragile. But I wanted to > make sure that my interpretation of what's going on is right before I have at > it. > Thanks, > -- Philip > {quote} > {quote} > Yes that's correct. I prefer to download the jars in "add jar". > Zheng > {quote} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.