[jira] Commented: (HIVE-1692) FetchOperator.getInputFormatFromCache hides causal exception
[ https://issues.apache.org/jira/browse/HIVE-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12918422#action_12918422 ] Philip Zeyliger commented on HIVE-1692: --- BTW, to illustrate what a difference 3 characters make, compare debugging the following two errors: (no patch) {noformat} 10/10/05 21:55:39 ERROR CliDriver: Failed with exception java.io.IOException:java.io.IOException: Cannot create an instance of InputFormat class org.apache.hadoop.mapred.TextInputFormat as specified in mapredWork! java.io.IOException: java.io.IOException: Cannot create an instance of InputFormat class org.apache.hadoop.mapred.TextInputFormat as specified in mapredWork! at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:271) at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:113) at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:657) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:131) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:186) Caused by: java.io.IOException: Cannot create an instance of InputFormat class org.apache.hadoop.mapred.TextInputFormat as specified in mapredWork! at org.apache.hadoop.hive.ql.exec.FetchOperator.getInputFormatFromCache(FetchOperator.java:113) at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:214) at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:250) ... 10 more {noformat} (patch) {noformat} 10/10/05 21:54:03 ERROR CliDriver: Failed with exception java.io.IOException:java.io.IOException: Cannot create an instance of InputFormat class org.apache.hadoop.mapred.TextInputFormat as specified in mapredWork! java.io.IOException: java.io.IOException: Cannot create an instance of InputFormat class org.apache.hadoop.mapred.TextInputFormat as specified in mapredWork! at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:271) at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:113) at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:657) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:131) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:186) Caused by: java.io.IOException: Cannot create an instance of InputFormat class org.apache.hadoop.mapred.TextInputFormat as specified in mapredWork! at org.apache.hadoop.hive.ql.exec.FetchOperator.getInputFormatFromCache(FetchOperator.java:113) at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:214) at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:250) ... 10 more Caused by: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.hive.ql.exec.FetchOperator.getInputFormatFromCache(FetchOperator.java:109) ... 12 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) ... 15 more Caused by: java.lang.IllegalArgumentException: Compression codec com.hadoop.compression.lzo.LzoCodec not found. at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:96) at org.apache.hadoop.io.com
[jira] Updated: (HIVE-1692) FetchOperator.getInputFormatFromCache hides causal exception
[ https://issues.apache.org/jira/browse/HIVE-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Zeyliger updated HIVE-1692: -- Attachment: HIVE-1692.patch.txt I'll spare folks downloading the attachment: {noformat} -+ inputFormatClass.getName() + " as specified in mapredWork!"); ++ inputFormatClass.getName() + " as specified in mapredWork!", e); {noformat} > FetchOperator.getInputFormatFromCache hides causal exception > > > Key: HIVE-1692 > URL: https://issues.apache.org/jira/browse/HIVE-1692 > Project: Hadoop Hive > Issue Type: Improvement > Components: Query Processor >Affects Versions: 0.7.0 >Reporter: Philip Zeyliger >Priority: Minor > Fix For: 0.7.0 > > Attachments: HIVE-1692.patch.txt > > > There's a line in FetchOperator.getInputFormatFromCache that catches all > exceptions and re-throws IOException instead, hiding the original cause. I > ran into this, naturally, and wish to fix it. Patch below is trivial. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1692) FetchOperator.getInputFormatFromCache hides causal exception
FetchOperator.getInputFormatFromCache hides causal exception Key: HIVE-1692 URL: https://issues.apache.org/jira/browse/HIVE-1692 Project: Hadoop Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.7.0 Reporter: Philip Zeyliger Priority: Minor Fix For: 0.7.0 There's a line in FetchOperator.getInputFormatFromCache that catches all exceptions and re-throws IOException instead, hiding the original cause. I ran into this, naturally, and wish to fix it. Patch below is trivial. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1157) UDFs can't be loaded via "add jar" when jar is on HDFS
[ https://issues.apache.org/jira/browse/HIVE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Zeyliger updated HIVE-1157: -- Attachment: HIVE-1157.patch.v6.txt > UDFs can't be loaded via "add jar" when jar is on HDFS > -- > > Key: HIVE-1157 > URL: https://issues.apache.org/jira/browse/HIVE-1157 > Project: Hadoop Hive > Issue Type: Improvement > Components: Query Processor >Reporter: Philip Zeyliger >Priority: Minor > Attachments: hive-1157.patch.txt, HIVE-1157.patch.v3.txt, > HIVE-1157.patch.v4.txt, HIVE-1157.patch.v5.txt, HIVE-1157.patch.v6.txt, > HIVE-1157.v2.patch.txt, output.txt > > > As discussed on the mailing list, it would be nice if you could use UDFs that > are on jars on HDFS. The proposed implementation would be for "add jar" to > recognize that the target file is on HDFS, copy it locally, and load it into > the classpath. > {quote} > Hi folks, > I have a quick question about UDF support in Hive. I'm on the 0.5 branch. > Can you use a UDF where the jar which contains the function is on HDFS, and > not on the local filesystem. Specifically, the following does not seem to > work: > # This is Hive 0.5, from svn > $bin/hive > Hive history file=/tmp/philip/hive_job_log_philip_201002081541_370227273.txt > hive> add jar hdfs://localhost/FooTest.jar; > > Added hdfs://localhost/FooTest.jar to class path > hive> create temporary function cube as 'com.cloudera.FooTestUDF'; > > FAILED: Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.FunctionTask > Does this work for other people? I could probably fix it by changing "add > jar" to download remote jars locally, when necessary (to load them into the > classpath), or update URLClassLoader (or whatever is underneath there) to > read directly from HDFS, which seems a bit more fragile. But I wanted to > make sure that my interpretation of what's going on is right before I have at > it. > Thanks, > -- Philip > {quote} > {quote} > Yes that's correct. I prefer to download the jars in "add jar". > Zheng > {quote} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1157) UDFs can't be loaded via "add jar" when jar is on HDFS
[ https://issues.apache.org/jira/browse/HIVE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916956#action_12916956 ] Philip Zeyliger commented on HIVE-1157: --- Namit, Thanks for the review. I've fixed the test failures. The one you pointed out was a missing log line from the results. And there was a second one having to do with relative paths. Oddly enough, however, when I tried to bring the changes up to current trunk, it turned out that HIVE-1624 conflicted, and, when I looked at it, it turns out to supply the same feature as this patch. I'll upload the fixed patch for posterity, but it looks like this issue is no longer necessary. -- Philip > UDFs can't be loaded via "add jar" when jar is on HDFS > -- > > Key: HIVE-1157 > URL: https://issues.apache.org/jira/browse/HIVE-1157 > Project: Hadoop Hive > Issue Type: Improvement > Components: Query Processor >Reporter: Philip Zeyliger >Priority: Minor > Attachments: hive-1157.patch.txt, HIVE-1157.patch.v3.txt, > HIVE-1157.patch.v4.txt, HIVE-1157.patch.v5.txt, HIVE-1157.v2.patch.txt, > output.txt > > > As discussed on the mailing list, it would be nice if you could use UDFs that > are on jars on HDFS. The proposed implementation would be for "add jar" to > recognize that the target file is on HDFS, copy it locally, and load it into > the classpath. > {quote} > Hi folks, > I have a quick question about UDF support in Hive. I'm on the 0.5 branch. > Can you use a UDF where the jar which contains the function is on HDFS, and > not on the local filesystem. Specifically, the following does not seem to > work: > # This is Hive 0.5, from svn > $bin/hive > Hive history file=/tmp/philip/hive_job_log_philip_201002081541_370227273.txt > hive> add jar hdfs://localhost/FooTest.jar; > > Added hdfs://localhost/FooTest.jar to class path > hive> create temporary function cube as 'com.cloudera.FooTestUDF'; > > FAILED: Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.FunctionTask > Does this work for other people? I could probably fix it by changing "add > jar" to download remote jars locally, when necessary (to load them into the > classpath), or update URLClassLoader (or whatever is underneath there) to > read directly from HDFS, which seems a bit more fragile. But I wanted to > make sure that my interpretation of what's going on is right before I have at > it. > Thanks, > -- Philip > {quote} > {quote} > Yes that's correct. I prefer to download the jars in "add jar". > Zheng > {quote} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1157) UDFs can't be loaded via "add jar" when jar is on HDFS
[ https://issues.apache.org/jira/browse/HIVE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Zeyliger updated HIVE-1157: -- Attachment: HIVE-1157.patch.v4.txt Carl, I updated the patch to current trunk. > UDFs can't be loaded via "add jar" when jar is on HDFS > -- > > Key: HIVE-1157 > URL: https://issues.apache.org/jira/browse/HIVE-1157 > Project: Hadoop Hive > Issue Type: Improvement > Components: Query Processor >Reporter: Philip Zeyliger >Priority: Minor > Attachments: hive-1157.patch.txt, HIVE-1157.patch.v3.txt, > HIVE-1157.patch.v4.txt, HIVE-1157.v2.patch.txt, output.txt > > > As discussed on the mailing list, it would be nice if you could use UDFs that > are on jars on HDFS. The proposed implementation would be for "add jar" to > recognize that the target file is on HDFS, copy it locally, and load it into > the classpath. > {quote} > Hi folks, > I have a quick question about UDF support in Hive. I'm on the 0.5 branch. > Can you use a UDF where the jar which contains the function is on HDFS, and > not on the local filesystem. Specifically, the following does not seem to > work: > # This is Hive 0.5, from svn > $bin/hive > Hive history file=/tmp/philip/hive_job_log_philip_201002081541_370227273.txt > hive> add jar hdfs://localhost/FooTest.jar; > > Added hdfs://localhost/FooTest.jar to class path > hive> create temporary function cube as 'com.cloudera.FooTestUDF'; > > FAILED: Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.FunctionTask > Does this work for other people? I could probably fix it by changing "add > jar" to download remote jars locally, when necessary (to load them into the > classpath), or update URLClassLoader (or whatever is underneath there) to > read directly from HDFS, which seems a bit more fragile. But I wanted to > make sure that my interpretation of what's going on is right before I have at > it. > Thanks, > -- Philip > {quote} > {quote} > Yes that's correct. I prefer to download the jars in "add jar". > Zheng > {quote} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1530) Include hive-default.xml and hive-log4j.properties in hive-common JAR
[ https://issues.apache.org/jira/browse/HIVE-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914528#action_12914528 ] Philip Zeyliger commented on HIVE-1530: --- +1. I'm a big fan of this change. We've repeatedly had customers using an old or weird hive-default or non-existent hive-default, and that's caused quite tricky to debug issues. > Include hive-default.xml and hive-log4j.properties in hive-common JAR > - > > Key: HIVE-1530 > URL: https://issues.apache.org/jira/browse/HIVE-1530 > Project: Hadoop Hive > Issue Type: Improvement > Components: Configuration >Reporter: Carl Steinbach >Assignee: Carl Steinbach > Fix For: 0.7.0 > > Attachments: HIVE-1530.1.patch.txt > > > hive-common-*.jar should include hive-default.xml and hive-log4j.properties, > and similarly hive-exec-*.jar should include hive-exec-log4j.properties. The > hive-default.xml file that currently sits in the conf/ directory should be > removed. > Motivations for this change: > * We explicitly tell users that they should never modify hive-default.xml yet > give them the opportunity to do so by placing the file in the conf dir. > * Many users are familiar with the Hadoop configuration mechanism that does > not require *-default.xml files to be present in the HADOOP_CONF_DIR, and > assume that the same is true for HIVE_CONF_DIR. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1157) UDFs can't be loaded via "add jar" when jar is on HDFS
[ https://issues.apache.org/jira/browse/HIVE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12850628#action_12850628 ] Philip Zeyliger commented on HIVE-1157: --- Edward, I'm having trouble reproducing the error you're seeing. {quote} create temporary function geoip as 'com.jointhegrid.hive.udf.GenericUDFGeoIP'; hive> select geoip(theIp ,'COUNTRY_NAME', './GeoLiteCity.dat.gz' ) from ip ; java.lang.ClassNotFoundException: com.jointhegrid.hive.udf.GenericUDFGeoIP Continuing ... {quote} On my machine, if I create temporary function with a class name that doesn't exist, it fails. So it makes no sense to me that "create temporary function" is succeeding, but then it's immediately not finding it. Do you have any theories on what's going on? Can you try to run it with debug on? Thanks! > UDFs can't be loaded via "add jar" when jar is on HDFS > -- > > Key: HIVE-1157 > URL: https://issues.apache.org/jira/browse/HIVE-1157 > Project: Hadoop Hive > Issue Type: Improvement > Components: Query Processor >Reporter: Philip Zeyliger >Priority: Minor > Attachments: hive-1157.patch.txt, HIVE-1157.patch.v3.txt, > HIVE-1157.v2.patch.txt, output.txt > > > As discussed on the mailing list, it would be nice if you could use UDFs that > are on jars on HDFS. The proposed implementation would be for "add jar" to > recognize that the target file is on HDFS, copy it locally, and load it into > the classpath. > {quote} > Hi folks, > I have a quick question about UDF support in Hive. I'm on the 0.5 branch. > Can you use a UDF where the jar which contains the function is on HDFS, and > not on the local filesystem. Specifically, the following does not seem to > work: > # This is Hive 0.5, from svn > $bin/hive > Hive history file=/tmp/philip/hive_job_log_philip_201002081541_370227273.txt > hive> add jar hdfs://localhost/FooTest.jar; > > Added hdfs://localhost/FooTest.jar to class path > hive> create temporary function cube as 'com.cloudera.FooTestUDF'; > > FAILED: Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.FunctionTask > Does this work for other people? I could probably fix it by changing "add > jar" to download remote jars locally, when necessary (to load them into the > classpath), or update URLClassLoader (or whatever is underneath there) to > read directly from HDFS, which seems a bit more fragile. But I wanted to > make sure that my interpretation of what's going on is right before I have at > it. > Thanks, > -- Philip > {quote} > {quote} > Yes that's correct. I prefer to download the jars in "add jar". > Zheng > {quote} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1157) UDFs can't be loaded via "add jar" when jar is on HDFS
[ https://issues.apache.org/jira/browse/HIVE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Zeyliger updated HIVE-1157: -- Attachment: HIVE-1157.patch.v3.txt Ed, Indeed, I've been able to reproduce that. I traced it down to some bad error handling when scratch_dir doesn't exist. The new patch creates a scratch dir if it doesn't already exist, and adds an if/else to make sure localFile.delete() isn't called if localFile is null. Sorry about that. I'm not sure whether something changed between when I created the patch and now on trunk to change how the scratchdir works, or if I had the scratch dir craeted by other tests in my local checkout. Either way, this should fix it. Thanks! > UDFs can't be loaded via "add jar" when jar is on HDFS > -- > > Key: HIVE-1157 > URL: https://issues.apache.org/jira/browse/HIVE-1157 > Project: Hadoop Hive > Issue Type: Improvement > Components: Query Processor >Reporter: Philip Zeyliger >Priority: Minor > Attachments: hive-1157.patch.txt, HIVE-1157.patch.v3.txt, > HIVE-1157.v2.patch.txt > > > As discussed on the mailing list, it would be nice if you could use UDFs that > are on jars on HDFS. The proposed implementation would be for "add jar" to > recognize that the target file is on HDFS, copy it locally, and load it into > the classpath. > {quote} > Hi folks, > I have a quick question about UDF support in Hive. I'm on the 0.5 branch. > Can you use a UDF where the jar which contains the function is on HDFS, and > not on the local filesystem. Specifically, the following does not seem to > work: > # This is Hive 0.5, from svn > $bin/hive > Hive history file=/tmp/philip/hive_job_log_philip_201002081541_370227273.txt > hive> add jar hdfs://localhost/FooTest.jar; > > Added hdfs://localhost/FooTest.jar to class path > hive> create temporary function cube as 'com.cloudera.FooTestUDF'; > > FAILED: Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.FunctionTask > Does this work for other people? I could probably fix it by changing "add > jar" to download remote jars locally, when necessary (to load them into the > classpath), or update URLClassLoader (or whatever is underneath there) to > read directly from HDFS, which seems a bit more fragile. But I wanted to > make sure that my interpretation of what's going on is right before I have at > it. > Thanks, > -- Philip > {quote} > {quote} > Yes that's correct. I prefer to download the jars in "add jar". > Zheng > {quote} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1157) UDFs can't be loaded via "add jar" when jar is on HDFS
[ https://issues.apache.org/jira/browse/HIVE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848218#action_12848218 ] Philip Zeyliger commented on HIVE-1157: --- Anyone care to take a look? > UDFs can't be loaded via "add jar" when jar is on HDFS > -- > > Key: HIVE-1157 > URL: https://issues.apache.org/jira/browse/HIVE-1157 > Project: Hadoop Hive > Issue Type: Improvement > Components: Query Processor >Reporter: Philip Zeyliger >Priority: Minor > Attachments: hive-1157.patch.txt, HIVE-1157.v2.patch.txt > > > As discussed on the mailing list, it would be nice if you could use UDFs that > are on jars on HDFS. The proposed implementation would be for "add jar" to > recognize that the target file is on HDFS, copy it locally, and load it into > the classpath. > {quote} > Hi folks, > I have a quick question about UDF support in Hive. I'm on the 0.5 branch. > Can you use a UDF where the jar which contains the function is on HDFS, and > not on the local filesystem. Specifically, the following does not seem to > work: > # This is Hive 0.5, from svn > $bin/hive > Hive history file=/tmp/philip/hive_job_log_philip_201002081541_370227273.txt > hive> add jar hdfs://localhost/FooTest.jar; > > Added hdfs://localhost/FooTest.jar to class path > hive> create temporary function cube as 'com.cloudera.FooTestUDF'; > > FAILED: Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.FunctionTask > Does this work for other people? I could probably fix it by changing "add > jar" to download remote jars locally, when necessary (to load them into the > classpath), or update URLClassLoader (or whatever is underneath there) to > read directly from HDFS, which seems a bit more fragile. But I wanted to > make sure that my interpretation of what's going on is right before I have at > it. > Thanks, > -- Philip > {quote} > {quote} > Yes that's correct. I prefer to download the jars in "add jar". > Zheng > {quote} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1157) UDFs can't be loaded via "add jar" when jar is on HDFS
[ https://issues.apache.org/jira/browse/HIVE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Zeyliger updated HIVE-1157: -- Attachment: HIVE-1157.v2.patch.txt I've uploaded a new patch with a bug fix (wasn't unregistering the jars correctly) and with a test. The test starts a MiniDFSCluster and runs add jar and delete jar explicitly, without using the ".q" framework. I felt this was the best way to test just the new behavior. > UDFs can't be loaded via "add jar" when jar is on HDFS > -- > > Key: HIVE-1157 > URL: https://issues.apache.org/jira/browse/HIVE-1157 > Project: Hadoop Hive > Issue Type: Improvement > Components: Query Processor >Reporter: Philip Zeyliger >Priority: Minor > Attachments: hive-1157.patch.txt, HIVE-1157.v2.patch.txt > > > As discussed on the mailing list, it would be nice if you could use UDFs that > are on jars on HDFS. The proposed implementation would be for "add jar" to > recognize that the target file is on HDFS, copy it locally, and load it into > the classpath. > {quote} > Hi folks, > I have a quick question about UDF support in Hive. I'm on the 0.5 branch. > Can you use a UDF where the jar which contains the function is on HDFS, and > not on the local filesystem. Specifically, the following does not seem to > work: > # This is Hive 0.5, from svn > $bin/hive > Hive history file=/tmp/philip/hive_job_log_philip_201002081541_370227273.txt > hive> add jar hdfs://localhost/FooTest.jar; > > Added hdfs://localhost/FooTest.jar to class path > hive> create temporary function cube as 'com.cloudera.FooTestUDF'; > > FAILED: Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.FunctionTask > Does this work for other people? I could probably fix it by changing "add > jar" to download remote jars locally, when necessary (to load them into the > classpath), or update URLClassLoader (or whatever is underneath there) to > read directly from HDFS, which seems a bit more fragile. But I wanted to > make sure that my interpretation of what's going on is right before I have at > it. > Thanks, > -- Philip > {quote} > {quote} > Yes that's correct. I prefer to download the jars in "add jar". > Zheng > {quote} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1157) UDFs can't be loaded via "add jar" when jar is on HDFS
[ https://issues.apache.org/jira/browse/HIVE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12841987#action_12841987 ] Philip Zeyliger commented on HIVE-1157: --- Has anyone had a chance to look at this? Would appreciate the feedback! Thanks! > UDFs can't be loaded via "add jar" when jar is on HDFS > -- > > Key: HIVE-1157 > URL: https://issues.apache.org/jira/browse/HIVE-1157 > Project: Hadoop Hive > Issue Type: Improvement > Components: Query Processor >Reporter: Philip Zeyliger >Priority: Minor > Attachments: hive-1157.patch.txt > > > As discussed on the mailing list, it would be nice if you could use UDFs that > are on jars on HDFS. The proposed implementation would be for "add jar" to > recognize that the target file is on HDFS, copy it locally, and load it into > the classpath. > {quote} > Hi folks, > I have a quick question about UDF support in Hive. I'm on the 0.5 branch. > Can you use a UDF where the jar which contains the function is on HDFS, and > not on the local filesystem. Specifically, the following does not seem to > work: > # This is Hive 0.5, from svn > $bin/hive > Hive history file=/tmp/philip/hive_job_log_philip_201002081541_370227273.txt > hive> add jar hdfs://localhost/FooTest.jar; > > Added hdfs://localhost/FooTest.jar to class path > hive> create temporary function cube as 'com.cloudera.FooTestUDF'; > > FAILED: Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.FunctionTask > Does this work for other people? I could probably fix it by changing "add > jar" to download remote jars locally, when necessary (to load them into the > classpath), or update URLClassLoader (or whatever is underneath there) to > read directly from HDFS, which seems a bit more fragile. But I wanted to > make sure that my interpretation of what's going on is right before I have at > it. > Thanks, > -- Philip > {quote} > {quote} > Yes that's correct. I prefer to download the jars in "add jar". > Zheng > {quote} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1157) UDFs can't be loaded via "add jar" when jar is on HDFS
[ https://issues.apache.org/jira/browse/HIVE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Zeyliger updated HIVE-1157: -- Attachment: hive-1157.patch.txt This patch changes SessionState.java to copy jar resources locally, if they're not local already. Because I had to manage additional per-resource state (namely, the location of the local copy, so that it can be cleaned up), I modified the ResourceType enum to be simply an enum, and now there is one ResourceHook object per resource, not per resource type. I changed the container map to be an EnumMap. It turns out that you can't specify an HDFS path to "-libjars", so I had to also modify ExecDriver.java to call a special method when it's getting jar resources. I would appreciate some guidance on how to test this best. So far, I've manually done the following steps: {noformat} create table t (x int); # Create a file with "1\n2\n3\n" as /tmp/a. load data local inpath '/tmp/a' into table t; add jar hdfs://localhost:8020/Test.jar; create temporary function cube as 'org.apache.hive.test.CubeSampleUDF'; # I wrote this select cube(x) from t; {noformat} What else would it be reasonable for me to do? It looks like there's no DFS in the test environment. I might be able to register an ad-hoc file system implementation of some sort or use mockito or some such... What do you recommend? I'm running the existing tests to make sure that I haven't broken anything. These seem to take a while, so I'll report back. > UDFs can't be loaded via "add jar" when jar is on HDFS > -- > > Key: HIVE-1157 > URL: https://issues.apache.org/jira/browse/HIVE-1157 > Project: Hadoop Hive > Issue Type: Improvement > Components: Query Processor >Reporter: Philip Zeyliger >Priority: Minor > Attachments: hive-1157.patch.txt > > > As discussed on the mailing list, it would be nice if you could use UDFs that > are on jars on HDFS. The proposed implementation would be for "add jar" to > recognize that the target file is on HDFS, copy it locally, and load it into > the classpath. > {quote} > Hi folks, > I have a quick question about UDF support in Hive. I'm on the 0.5 branch. > Can you use a UDF where the jar which contains the function is on HDFS, and > not on the local filesystem. Specifically, the following does not seem to > work: > # This is Hive 0.5, from svn > $bin/hive > Hive history file=/tmp/philip/hive_job_log_philip_201002081541_370227273.txt > hive> add jar hdfs://localhost/FooTest.jar; > > Added hdfs://localhost/FooTest.jar to class path > hive> create temporary function cube as 'com.cloudera.FooTestUDF'; > > FAILED: Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.FunctionTask > Does this work for other people? I could probably fix it by changing "add > jar" to download remote jars locally, when necessary (to load them into the > classpath), or update URLClassLoader (or whatever is underneath there) to > read directly from HDFS, which seems a bit more fragile. But I wanted to > make sure that my interpretation of what's going on is right before I have at > it. > Thanks, > -- Philip > {quote} > {quote} > Yes that's correct. I prefer to download the jars in "add jar". > Zheng > {quote} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1157) UDFs can't be loaded via "add jar" when jar is on HDFS
[ https://issues.apache.org/jira/browse/HIVE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832389#action_12832389 ] Philip Zeyliger commented on HIVE-1157: --- Edward, I'm not sure what you mean. -- Philip > UDFs can't be loaded via "add jar" when jar is on HDFS > -- > > Key: HIVE-1157 > URL: https://issues.apache.org/jira/browse/HIVE-1157 > Project: Hadoop Hive > Issue Type: Improvement > Components: Query Processor >Reporter: Philip Zeyliger >Priority: Minor > > As discussed on the mailing list, it would be nice if you could use UDFs that > are on jars on HDFS. The proposed implementation would be for "add jar" to > recognize that the target file is on HDFS, copy it locally, and load it into > the classpath. > {quote} > Hi folks, > I have a quick question about UDF support in Hive. I'm on the 0.5 branch. > Can you use a UDF where the jar which contains the function is on HDFS, and > not on the local filesystem. Specifically, the following does not seem to > work: > # This is Hive 0.5, from svn > $bin/hive > Hive history file=/tmp/philip/hive_job_log_philip_201002081541_370227273.txt > hive> add jar hdfs://localhost/FooTest.jar; > > Added hdfs://localhost/FooTest.jar to class path > hive> create temporary function cube as 'com.cloudera.FooTestUDF'; > > FAILED: Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.FunctionTask > Does this work for other people? I could probably fix it by changing "add > jar" to download remote jars locally, when necessary (to load them into the > classpath), or update URLClassLoader (or whatever is underneath there) to > read directly from HDFS, which seems a bit more fragile. But I wanted to > make sure that my interpretation of what's going on is right before I have at > it. > Thanks, > -- Philip > {quote} > {quote} > Yes that's correct. I prefer to download the jars in "add jar". > Zheng > {quote} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1157) UDFs can't be loaded via "add jar" when jar is on HDFS
UDFs can't be loaded via "add jar" when jar is on HDFS -- Key: HIVE-1157 URL: https://issues.apache.org/jira/browse/HIVE-1157 Project: Hadoop Hive Issue Type: Improvement Components: Query Processor Reporter: Philip Zeyliger Priority: Minor As discussed on the mailing list, it would be nice if you could use UDFs that are on jars on HDFS. The proposed implementation would be for "add jar" to recognize that the target file is on HDFS, copy it locally, and load it into the classpath. {quote} Hi folks, I have a quick question about UDF support in Hive. I'm on the 0.5 branch. Can you use a UDF where the jar which contains the function is on HDFS, and not on the local filesystem. Specifically, the following does not seem to work: # This is Hive 0.5, from svn $bin/hive Hive history file=/tmp/philip/hive_job_log_philip_201002081541_370227273.txt hive> add jar hdfs://localhost/FooTest.jar; Added hdfs://localhost/FooTest.jar to class path hive> create temporary function cube as 'com.cloudera.FooTestUDF'; FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask Does this work for other people? I could probably fix it by changing "add jar" to download remote jars locally, when necessary (to load them into the classpath), or update URLClassLoader (or whatever is underneath there) to read directly from HDFS, which seems a bit more fragile. But I wanted to make sure that my interpretation of what's going on is right before I have at it. Thanks, -- Philip {quote} {quote} Yes that's correct. I prefer to download the jars in "add jar". Zheng {quote} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-802) Bug in DataNucleus prevents Hive from building if inside a dir with '+' in it
[ https://issues.apache.org/jira/browse/HIVE-802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12828844#action_12828844 ] Philip Zeyliger commented on HIVE-802: -- If we uploaded a patched datanucleus to solve this issue, would folks be alright including it in Hive (and possibly the 0.5 branch)? I ran into this again recently, trying to run Hive's 0.5 metastore server against the current version of Cloudera's distribution, and it took a while to decipher the error. Thanks, -- Philip > Bug in DataNucleus prevents Hive from building if inside a dir with '+' in it > - > > Key: HIVE-802 > URL: https://issues.apache.org/jira/browse/HIVE-802 > Project: Hadoop Hive > Issue Type: Bug > Components: Build Infrastructure >Reporter: Todd Lipcon > > There's a bug in DataNucleus that causes this issue: > http://www.jpox.org/servlet/jira/browse/NUCCORE-371 > To reproduce, simply put your hive source tree in a directory that contains a > '+' character. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.