[
https://issues.apache.org/jira/browse/HIVE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Philip Zeyliger updated HIVE-1157:
----------------------------------
Attachment: hive-1157.patch.txt
This patch changes SessionState.java to copy jar resources locally, if they're
not local already.
Because I had to manage additional per-resource state (namely, the location of
the local copy, so that it can be cleaned up), I modified the ResourceType enum
to be simply an enum, and now there is one ResourceHook object per resource,
not per resource type. I changed the container map to be an EnumMap.
It turns out that you can't specify an HDFS path to "-libjars", so I had to
also modify ExecDriver.java to call a special method when it's getting jar
resources.
I would appreciate some guidance on how to test this best. So far, I've
manually done the following steps:
{noformat}
create table t (x int);
# Create a file with "1\n2\n3\n" as /tmp/a.
load data local inpath '/tmp/a' into table t;
add jar hdfs://localhost:8020/Test.jar;
create temporary function cube as 'org.apache.hive.test.CubeSampleUDF'; # I
wrote this
select cube(x) from t;
{noformat}
What else would it be reasonable for me to do? It looks like there's no DFS in
the test environment. I might be able to register an ad-hoc file system
implementation of some sort or use mockito or some such... What do you
recommend?
I'm running the existing tests to make sure that I haven't broken anything.
These seem to take a while, so I'll report back.
> UDFs can't be loaded via "add jar" when jar is on HDFS
> ------------------------------------------------------
>
> Key: HIVE-1157
> URL: https://issues.apache.org/jira/browse/HIVE-1157
> Project: Hadoop Hive
> Issue Type: Improvement
> Components: Query Processor
> Reporter: Philip Zeyliger
> Priority: Minor
> Attachments: hive-1157.patch.txt
>
>
> As discussed on the mailing list, it would be nice if you could use UDFs that
> are on jars on HDFS. The proposed implementation would be for "add jar" to
> recognize that the target file is on HDFS, copy it locally, and load it into
> the classpath.
> {quote}
> Hi folks,
> I have a quick question about UDF support in Hive. I'm on the 0.5 branch.
> Can you use a UDF where the jar which contains the function is on HDFS, and
> not on the local filesystem. Specifically, the following does not seem to
> work:
> # This is Hive 0.5, from svn
> $bin/hive
> Hive history file=/tmp/philip/hive_job_log_philip_201002081541_370227273.txt
> hive> add jar hdfs://localhost/FooTest.jar;
>
> Added hdfs://localhost/FooTest.jar to class path
> hive> create temporary function cube as 'com.cloudera.FooTestUDF';
>
> FAILED: Execution Error, return code 1 from
> org.apache.hadoop.hive.ql.exec.FunctionTask
> Does this work for other people? I could probably fix it by changing "add
> jar" to download remote jars locally, when necessary (to load them into the
> classpath), or update URLClassLoader (or whatever is underneath there) to
> read directly from HDFS, which seems a bit more fragile. But I wanted to
> make sure that my interpretation of what's going on is right before I have at
> it.
> Thanks,
> -- Philip
> {quote}
> {quote}
> Yes that's correct. I prefer to download the jars in "add jar".
> Zheng
> {quote}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.