[jira] Updated: (HIVE-1157) UDFs can't be loaded via "add jar" when jar is on HDFS
[ https://issues.apache.org/jira/browse/HIVE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Zeyliger updated HIVE-1157: -- Attachment: HIVE-1157.patch.v6.txt > UDFs can't be loaded via "add jar" when jar is on HDFS > -- > > Key: HIVE-1157 > URL: https://issues.apache.org/jira/browse/HIVE-1157 > Project: Hadoop Hive > Issue Type: Improvement > Components: Query Processor >Reporter: Philip Zeyliger >Priority: Minor > Attachments: hive-1157.patch.txt, HIVE-1157.patch.v3.txt, > HIVE-1157.patch.v4.txt, HIVE-1157.patch.v5.txt, HIVE-1157.patch.v6.txt, > HIVE-1157.v2.patch.txt, output.txt > > > As discussed on the mailing list, it would be nice if you could use UDFs that > are on jars on HDFS. The proposed implementation would be for "add jar" to > recognize that the target file is on HDFS, copy it locally, and load it into > the classpath. > {quote} > Hi folks, > I have a quick question about UDF support in Hive. I'm on the 0.5 branch. > Can you use a UDF where the jar which contains the function is on HDFS, and > not on the local filesystem. Specifically, the following does not seem to > work: > # This is Hive 0.5, from svn > $bin/hive > Hive history file=/tmp/philip/hive_job_log_philip_201002081541_370227273.txt > hive> add jar hdfs://localhost/FooTest.jar; > > Added hdfs://localhost/FooTest.jar to class path > hive> create temporary function cube as 'com.cloudera.FooTestUDF'; > > FAILED: Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.FunctionTask > Does this work for other people? I could probably fix it by changing "add > jar" to download remote jars locally, when necessary (to load them into the > classpath), or update URLClassLoader (or whatever is underneath there) to > read directly from HDFS, which seems a bit more fragile. But I wanted to > make sure that my interpretation of what's going on is right before I have at > it. > Thanks, > -- Philip > {quote} > {quote} > Yes that's correct. I prefer to download the jars in "add jar". > Zheng > {quote} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1157) UDFs can't be loaded via "add jar" when jar is on HDFS
[ https://issues.apache.org/jira/browse/HIVE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1157: - Status: Open (was: Patch Available) > UDFs can't be loaded via "add jar" when jar is on HDFS > -- > > Key: HIVE-1157 > URL: https://issues.apache.org/jira/browse/HIVE-1157 > Project: Hadoop Hive > Issue Type: Improvement > Components: Query Processor >Reporter: Philip Zeyliger >Priority: Minor > Attachments: hive-1157.patch.txt, HIVE-1157.patch.v3.txt, > HIVE-1157.patch.v4.txt, HIVE-1157.patch.v5.txt, HIVE-1157.v2.patch.txt, > output.txt > > > As discussed on the mailing list, it would be nice if you could use UDFs that > are on jars on HDFS. The proposed implementation would be for "add jar" to > recognize that the target file is on HDFS, copy it locally, and load it into > the classpath. > {quote} > Hi folks, > I have a quick question about UDF support in Hive. I'm on the 0.5 branch. > Can you use a UDF where the jar which contains the function is on HDFS, and > not on the local filesystem. Specifically, the following does not seem to > work: > # This is Hive 0.5, from svn > $bin/hive > Hive history file=/tmp/philip/hive_job_log_philip_201002081541_370227273.txt > hive> add jar hdfs://localhost/FooTest.jar; > > Added hdfs://localhost/FooTest.jar to class path > hive> create temporary function cube as 'com.cloudera.FooTestUDF'; > > FAILED: Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.FunctionTask > Does this work for other people? I could probably fix it by changing "add > jar" to download remote jars locally, when necessary (to load them into the > classpath), or update URLClassLoader (or whatever is underneath there) to > read directly from HDFS, which seems a bit more fragile. But I wanted to > make sure that my interpretation of what's going on is right before I have at > it. > Thanks, > -- Philip > {quote} > {quote} > Yes that's correct. I prefer to download the jars in "add jar". > Zheng > {quote} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1157) UDFs can't be loaded via "add jar" when jar is on HDFS
[ https://issues.apache.org/jira/browse/HIVE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-1157: - Status: Patch Available (was: Open) > UDFs can't be loaded via "add jar" when jar is on HDFS > -- > > Key: HIVE-1157 > URL: https://issues.apache.org/jira/browse/HIVE-1157 > Project: Hadoop Hive > Issue Type: Improvement > Components: Query Processor >Reporter: Philip Zeyliger >Priority: Minor > Attachments: hive-1157.patch.txt, HIVE-1157.patch.v3.txt, > HIVE-1157.patch.v4.txt, HIVE-1157.patch.v5.txt, HIVE-1157.v2.patch.txt, > output.txt > > > As discussed on the mailing list, it would be nice if you could use UDFs that > are on jars on HDFS. The proposed implementation would be for "add jar" to > recognize that the target file is on HDFS, copy it locally, and load it into > the classpath. > {quote} > Hi folks, > I have a quick question about UDF support in Hive. I'm on the 0.5 branch. > Can you use a UDF where the jar which contains the function is on HDFS, and > not on the local filesystem. Specifically, the following does not seem to > work: > # This is Hive 0.5, from svn > $bin/hive > Hive history file=/tmp/philip/hive_job_log_philip_201002081541_370227273.txt > hive> add jar hdfs://localhost/FooTest.jar; > > Added hdfs://localhost/FooTest.jar to class path > hive> create temporary function cube as 'com.cloudera.FooTestUDF'; > > FAILED: Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.FunctionTask > Does this work for other people? I could probably fix it by changing "add > jar" to download remote jars locally, when necessary (to load them into the > classpath), or update URLClassLoader (or whatever is underneath there) to > read directly from HDFS, which seems a bit more fragile. But I wanted to > make sure that my interpretation of what's going on is right before I have at > it. > Thanks, > -- Philip > {quote} > {quote} > Yes that's correct. I prefer to download the jars in "add jar". > Zheng > {quote} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1157) UDFs can't be loaded via "add jar" when jar is on HDFS
[ https://issues.apache.org/jira/browse/HIVE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-1157: - Attachment: HIVE-1157.patch.v5.txt Attaching an updated version of Phil's patch that applies cleanly with -p0 > UDFs can't be loaded via "add jar" when jar is on HDFS > -- > > Key: HIVE-1157 > URL: https://issues.apache.org/jira/browse/HIVE-1157 > Project: Hadoop Hive > Issue Type: Improvement > Components: Query Processor >Reporter: Philip Zeyliger >Priority: Minor > Attachments: hive-1157.patch.txt, HIVE-1157.patch.v3.txt, > HIVE-1157.patch.v4.txt, HIVE-1157.patch.v5.txt, HIVE-1157.v2.patch.txt, > output.txt > > > As discussed on the mailing list, it would be nice if you could use UDFs that > are on jars on HDFS. The proposed implementation would be for "add jar" to > recognize that the target file is on HDFS, copy it locally, and load it into > the classpath. > {quote} > Hi folks, > I have a quick question about UDF support in Hive. I'm on the 0.5 branch. > Can you use a UDF where the jar which contains the function is on HDFS, and > not on the local filesystem. Specifically, the following does not seem to > work: > # This is Hive 0.5, from svn > $bin/hive > Hive history file=/tmp/philip/hive_job_log_philip_201002081541_370227273.txt > hive> add jar hdfs://localhost/FooTest.jar; > > Added hdfs://localhost/FooTest.jar to class path > hive> create temporary function cube as 'com.cloudera.FooTestUDF'; > > FAILED: Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.FunctionTask > Does this work for other people? I could probably fix it by changing "add > jar" to download remote jars locally, when necessary (to load them into the > classpath), or update URLClassLoader (or whatever is underneath there) to > read directly from HDFS, which seems a bit more fragile. But I wanted to > make sure that my interpretation of what's going on is right before I have at > it. > Thanks, > -- Philip > {quote} > {quote} > Yes that's correct. I prefer to download the jars in "add jar". > Zheng > {quote} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1157) UDFs can't be loaded via "add jar" when jar is on HDFS
[ https://issues.apache.org/jira/browse/HIVE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Zeyliger updated HIVE-1157: -- Attachment: HIVE-1157.patch.v4.txt Carl, I updated the patch to current trunk. > UDFs can't be loaded via "add jar" when jar is on HDFS > -- > > Key: HIVE-1157 > URL: https://issues.apache.org/jira/browse/HIVE-1157 > Project: Hadoop Hive > Issue Type: Improvement > Components: Query Processor >Reporter: Philip Zeyliger >Priority: Minor > Attachments: hive-1157.patch.txt, HIVE-1157.patch.v3.txt, > HIVE-1157.patch.v4.txt, HIVE-1157.v2.patch.txt, output.txt > > > As discussed on the mailing list, it would be nice if you could use UDFs that > are on jars on HDFS. The proposed implementation would be for "add jar" to > recognize that the target file is on HDFS, copy it locally, and load it into > the classpath. > {quote} > Hi folks, > I have a quick question about UDF support in Hive. I'm on the 0.5 branch. > Can you use a UDF where the jar which contains the function is on HDFS, and > not on the local filesystem. Specifically, the following does not seem to > work: > # This is Hive 0.5, from svn > $bin/hive > Hive history file=/tmp/philip/hive_job_log_philip_201002081541_370227273.txt > hive> add jar hdfs://localhost/FooTest.jar; > > Added hdfs://localhost/FooTest.jar to class path > hive> create temporary function cube as 'com.cloudera.FooTestUDF'; > > FAILED: Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.FunctionTask > Does this work for other people? I could probably fix it by changing "add > jar" to download remote jars locally, when necessary (to load them into the > classpath), or update URLClassLoader (or whatever is underneath there) to > read directly from HDFS, which seems a bit more fragile. But I wanted to > make sure that my interpretation of what's going on is right before I have at > it. > Thanks, > -- Philip > {quote} > {quote} > Yes that's correct. I prefer to download the jars in "add jar". > Zheng > {quote} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1157) UDFs can't be loaded via "add jar" when jar is on HDFS
[ https://issues.apache.org/jira/browse/HIVE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-1157: -- Attachment: output.txt > UDFs can't be loaded via "add jar" when jar is on HDFS > -- > > Key: HIVE-1157 > URL: https://issues.apache.org/jira/browse/HIVE-1157 > Project: Hadoop Hive > Issue Type: Improvement > Components: Query Processor >Reporter: Philip Zeyliger >Priority: Minor > Attachments: hive-1157.patch.txt, HIVE-1157.patch.v3.txt, > HIVE-1157.v2.patch.txt, output.txt > > > As discussed on the mailing list, it would be nice if you could use UDFs that > are on jars on HDFS. The proposed implementation would be for "add jar" to > recognize that the target file is on HDFS, copy it locally, and load it into > the classpath. > {quote} > Hi folks, > I have a quick question about UDF support in Hive. I'm on the 0.5 branch. > Can you use a UDF where the jar which contains the function is on HDFS, and > not on the local filesystem. Specifically, the following does not seem to > work: > # This is Hive 0.5, from svn > $bin/hive > Hive history file=/tmp/philip/hive_job_log_philip_201002081541_370227273.txt > hive> add jar hdfs://localhost/FooTest.jar; > > Added hdfs://localhost/FooTest.jar to class path > hive> create temporary function cube as 'com.cloudera.FooTestUDF'; > > FAILED: Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.FunctionTask > Does this work for other people? I could probably fix it by changing "add > jar" to download remote jars locally, when necessary (to load them into the > classpath), or update URLClassLoader (or whatever is underneath there) to > read directly from HDFS, which seems a bit more fragile. But I wanted to > make sure that my interpretation of what's going on is right before I have at > it. > Thanks, > -- Philip > {quote} > {quote} > Yes that's correct. I prefer to download the jars in "add jar". > Zheng > {quote} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1157) UDFs can't be loaded via "add jar" when jar is on HDFS
[ https://issues.apache.org/jira/browse/HIVE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Zeyliger updated HIVE-1157: -- Attachment: HIVE-1157.patch.v3.txt Ed, Indeed, I've been able to reproduce that. I traced it down to some bad error handling when scratch_dir doesn't exist. The new patch creates a scratch dir if it doesn't already exist, and adds an if/else to make sure localFile.delete() isn't called if localFile is null. Sorry about that. I'm not sure whether something changed between when I created the patch and now on trunk to change how the scratchdir works, or if I had the scratch dir craeted by other tests in my local checkout. Either way, this should fix it. Thanks! > UDFs can't be loaded via "add jar" when jar is on HDFS > -- > > Key: HIVE-1157 > URL: https://issues.apache.org/jira/browse/HIVE-1157 > Project: Hadoop Hive > Issue Type: Improvement > Components: Query Processor >Reporter: Philip Zeyliger >Priority: Minor > Attachments: hive-1157.patch.txt, HIVE-1157.patch.v3.txt, > HIVE-1157.v2.patch.txt > > > As discussed on the mailing list, it would be nice if you could use UDFs that > are on jars on HDFS. The proposed implementation would be for "add jar" to > recognize that the target file is on HDFS, copy it locally, and load it into > the classpath. > {quote} > Hi folks, > I have a quick question about UDF support in Hive. I'm on the 0.5 branch. > Can you use a UDF where the jar which contains the function is on HDFS, and > not on the local filesystem. Specifically, the following does not seem to > work: > # This is Hive 0.5, from svn > $bin/hive > Hive history file=/tmp/philip/hive_job_log_philip_201002081541_370227273.txt > hive> add jar hdfs://localhost/FooTest.jar; > > Added hdfs://localhost/FooTest.jar to class path > hive> create temporary function cube as 'com.cloudera.FooTestUDF'; > > FAILED: Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.FunctionTask > Does this work for other people? I could probably fix it by changing "add > jar" to download remote jars locally, when necessary (to load them into the > classpath), or update URLClassLoader (or whatever is underneath there) to > read directly from HDFS, which seems a bit more fragile. But I wanted to > make sure that my interpretation of what's going on is right before I have at > it. > Thanks, > -- Philip > {quote} > {quote} > Yes that's correct. I prefer to download the jars in "add jar". > Zheng > {quote} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1157) UDFs can't be loaded via "add jar" when jar is on HDFS
[ https://issues.apache.org/jira/browse/HIVE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Zeyliger updated HIVE-1157: -- Attachment: HIVE-1157.v2.patch.txt I've uploaded a new patch with a bug fix (wasn't unregistering the jars correctly) and with a test. The test starts a MiniDFSCluster and runs add jar and delete jar explicitly, without using the ".q" framework. I felt this was the best way to test just the new behavior. > UDFs can't be loaded via "add jar" when jar is on HDFS > -- > > Key: HIVE-1157 > URL: https://issues.apache.org/jira/browse/HIVE-1157 > Project: Hadoop Hive > Issue Type: Improvement > Components: Query Processor >Reporter: Philip Zeyliger >Priority: Minor > Attachments: hive-1157.patch.txt, HIVE-1157.v2.patch.txt > > > As discussed on the mailing list, it would be nice if you could use UDFs that > are on jars on HDFS. The proposed implementation would be for "add jar" to > recognize that the target file is on HDFS, copy it locally, and load it into > the classpath. > {quote} > Hi folks, > I have a quick question about UDF support in Hive. I'm on the 0.5 branch. > Can you use a UDF where the jar which contains the function is on HDFS, and > not on the local filesystem. Specifically, the following does not seem to > work: > # This is Hive 0.5, from svn > $bin/hive > Hive history file=/tmp/philip/hive_job_log_philip_201002081541_370227273.txt > hive> add jar hdfs://localhost/FooTest.jar; > > Added hdfs://localhost/FooTest.jar to class path > hive> create temporary function cube as 'com.cloudera.FooTestUDF'; > > FAILED: Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.FunctionTask > Does this work for other people? I could probably fix it by changing "add > jar" to download remote jars locally, when necessary (to load them into the > classpath), or update URLClassLoader (or whatever is underneath there) to > read directly from HDFS, which seems a bit more fragile. But I wanted to > make sure that my interpretation of what's going on is right before I have at > it. > Thanks, > -- Philip > {quote} > {quote} > Yes that's correct. I prefer to download the jars in "add jar". > Zheng > {quote} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1157) UDFs can't be loaded via "add jar" when jar is on HDFS
[ https://issues.apache.org/jira/browse/HIVE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Zeyliger updated HIVE-1157: -- Attachment: hive-1157.patch.txt This patch changes SessionState.java to copy jar resources locally, if they're not local already. Because I had to manage additional per-resource state (namely, the location of the local copy, so that it can be cleaned up), I modified the ResourceType enum to be simply an enum, and now there is one ResourceHook object per resource, not per resource type. I changed the container map to be an EnumMap. It turns out that you can't specify an HDFS path to "-libjars", so I had to also modify ExecDriver.java to call a special method when it's getting jar resources. I would appreciate some guidance on how to test this best. So far, I've manually done the following steps: {noformat} create table t (x int); # Create a file with "1\n2\n3\n" as /tmp/a. load data local inpath '/tmp/a' into table t; add jar hdfs://localhost:8020/Test.jar; create temporary function cube as 'org.apache.hive.test.CubeSampleUDF'; # I wrote this select cube(x) from t; {noformat} What else would it be reasonable for me to do? It looks like there's no DFS in the test environment. I might be able to register an ad-hoc file system implementation of some sort or use mockito or some such... What do you recommend? I'm running the existing tests to make sure that I haven't broken anything. These seem to take a while, so I'll report back. > UDFs can't be loaded via "add jar" when jar is on HDFS > -- > > Key: HIVE-1157 > URL: https://issues.apache.org/jira/browse/HIVE-1157 > Project: Hadoop Hive > Issue Type: Improvement > Components: Query Processor >Reporter: Philip Zeyliger >Priority: Minor > Attachments: hive-1157.patch.txt > > > As discussed on the mailing list, it would be nice if you could use UDFs that > are on jars on HDFS. The proposed implementation would be for "add jar" to > recognize that the target file is on HDFS, copy it locally, and load it into > the classpath. > {quote} > Hi folks, > I have a quick question about UDF support in Hive. I'm on the 0.5 branch. > Can you use a UDF where the jar which contains the function is on HDFS, and > not on the local filesystem. Specifically, the following does not seem to > work: > # This is Hive 0.5, from svn > $bin/hive > Hive history file=/tmp/philip/hive_job_log_philip_201002081541_370227273.txt > hive> add jar hdfs://localhost/FooTest.jar; > > Added hdfs://localhost/FooTest.jar to class path > hive> create temporary function cube as 'com.cloudera.FooTestUDF'; > > FAILED: Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.FunctionTask > Does this work for other people? I could probably fix it by changing "add > jar" to download remote jars locally, when necessary (to load them into the > classpath), or update URLClassLoader (or whatever is underneath there) to > read directly from HDFS, which seems a bit more fragile. But I wanted to > make sure that my interpretation of what's going on is right before I have at > it. > Thanks, > -- Philip > {quote} > {quote} > Yes that's correct. I prefer to download the jars in "add jar". > Zheng > {quote} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.