[ 
https://issues.apache.org/jira/browse/HIVE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Zeyliger updated HIVE-1157:
----------------------------------

    Attachment: hive-1157.patch.txt

This patch changes SessionState.java to copy jar resources locally, if they're 
not local already.

Because I had to manage additional per-resource state (namely, the location of 
the local copy, so that it can be cleaned up), I modified the ResourceType enum 
to be simply an enum, and now there is one ResourceHook object per resource, 
not per resource type.  I changed the container map to be an EnumMap.

It turns out that you can't specify an HDFS path to "-libjars", so I had to 
also modify ExecDriver.java to call a special method when it's getting jar 
resources.

I would appreciate some guidance on how to test this best.  So far, I've 
manually done the following steps:
{noformat}
create table t (x int);
# Create a file with "1\n2\n3\n" as /tmp/a.
load data local inpath '/tmp/a' into table t;
add jar hdfs://localhost:8020/Test.jar;
create temporary function cube as 'org.apache.hive.test.CubeSampleUDF';  # I 
wrote this
select cube(x) from t;
{noformat}
What else would it be reasonable for me to do?  It looks like there's no DFS in 
the test environment.  I might be able to register an ad-hoc file system 
implementation of some sort or use mockito or some such...  What do you 
recommend?

I'm running the existing tests to make sure that I haven't broken anything.  
These seem to take a while, so I'll report back.

> UDFs can't be loaded via "add jar" when jar is on HDFS
> ------------------------------------------------------
>
>                 Key: HIVE-1157
>                 URL: https://issues.apache.org/jira/browse/HIVE-1157
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Philip Zeyliger
>            Priority: Minor
>         Attachments: hive-1157.patch.txt
>
>
> As discussed on the mailing list, it would be nice if you could use UDFs that 
> are on jars on HDFS.  The proposed implementation would be for "add jar" to 
> recognize that the target file is on HDFS, copy it locally, and load it into 
> the classpath.
> {quote}
> Hi folks,
> I have a quick question about UDF support in Hive.  I'm on the 0.5 branch.  
> Can you use a UDF where the jar which contains the function is on HDFS, and 
> not on the local filesystem.  Specifically, the following does not seem to 
> work:
> # This is Hive 0.5, from svn
> $bin/hive                                              
> Hive history file=/tmp/philip/hive_job_log_philip_201002081541_370227273.txt
> hive> add jar hdfs://localhost/FooTest.jar;                                   
>                
> Added hdfs://localhost/FooTest.jar to class path
> hive> create temporary function cube as 'com.cloudera.FooTestUDF';            
>         
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.FunctionTask
> Does this work for other people?  I could probably fix it by changing "add 
> jar" to download remote jars locally, when necessary (to load them into the 
> classpath), or update URLClassLoader (or whatever is underneath there) to 
> read directly from HDFS, which seems a bit more fragile.  But I wanted to 
> make sure that my interpretation of what's going on is right before I have at 
> it.
> Thanks,
> -- Philip
> {quote}
> {quote}
> Yes that's correct. I prefer to download the jars in "add jar".
> Zheng
> {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to