[jira] Updated: (HIVE-1157) UDFs can't be loaded via "add jar" when jar is on HDFS

2010-10-01 Thread Philip Zeyliger (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Zeyliger updated HIVE-1157:
--

Attachment: HIVE-1157.patch.v6.txt

> UDFs can't be loaded via "add jar" when jar is on HDFS
> --
>
> Key: HIVE-1157
> URL: https://issues.apache.org/jira/browse/HIVE-1157
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Philip Zeyliger
>Priority: Minor
> Attachments: hive-1157.patch.txt, HIVE-1157.patch.v3.txt, 
> HIVE-1157.patch.v4.txt, HIVE-1157.patch.v5.txt, HIVE-1157.patch.v6.txt, 
> HIVE-1157.v2.patch.txt, output.txt
>
>
> As discussed on the mailing list, it would be nice if you could use UDFs that 
> are on jars on HDFS.  The proposed implementation would be for "add jar" to 
> recognize that the target file is on HDFS, copy it locally, and load it into 
> the classpath.
> {quote}
> Hi folks,
> I have a quick question about UDF support in Hive.  I'm on the 0.5 branch.  
> Can you use a UDF where the jar which contains the function is on HDFS, and 
> not on the local filesystem.  Specifically, the following does not seem to 
> work:
> # This is Hive 0.5, from svn
> $bin/hive  
> Hive history file=/tmp/philip/hive_job_log_philip_201002081541_370227273.txt
> hive> add jar hdfs://localhost/FooTest.jar;   
>
> Added hdfs://localhost/FooTest.jar to class path
> hive> create temporary function cube as 'com.cloudera.FooTestUDF';
> 
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.FunctionTask
> Does this work for other people?  I could probably fix it by changing "add 
> jar" to download remote jars locally, when necessary (to load them into the 
> classpath), or update URLClassLoader (or whatever is underneath there) to 
> read directly from HDFS, which seems a bit more fragile.  But I wanted to 
> make sure that my interpretation of what's going on is right before I have at 
> it.
> Thanks,
> -- Philip
> {quote}
> {quote}
> Yes that's correct. I prefer to download the jars in "add jar".
> Zheng
> {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1157) UDFs can't be loaded via "add jar" when jar is on HDFS

2010-09-30 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1157:
-

Status: Open  (was: Patch Available)

> UDFs can't be loaded via "add jar" when jar is on HDFS
> --
>
> Key: HIVE-1157
> URL: https://issues.apache.org/jira/browse/HIVE-1157
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Philip Zeyliger
>Priority: Minor
> Attachments: hive-1157.patch.txt, HIVE-1157.patch.v3.txt, 
> HIVE-1157.patch.v4.txt, HIVE-1157.patch.v5.txt, HIVE-1157.v2.patch.txt, 
> output.txt
>
>
> As discussed on the mailing list, it would be nice if you could use UDFs that 
> are on jars on HDFS.  The proposed implementation would be for "add jar" to 
> recognize that the target file is on HDFS, copy it locally, and load it into 
> the classpath.
> {quote}
> Hi folks,
> I have a quick question about UDF support in Hive.  I'm on the 0.5 branch.  
> Can you use a UDF where the jar which contains the function is on HDFS, and 
> not on the local filesystem.  Specifically, the following does not seem to 
> work:
> # This is Hive 0.5, from svn
> $bin/hive  
> Hive history file=/tmp/philip/hive_job_log_philip_201002081541_370227273.txt
> hive> add jar hdfs://localhost/FooTest.jar;   
>
> Added hdfs://localhost/FooTest.jar to class path
> hive> create temporary function cube as 'com.cloudera.FooTestUDF';
> 
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.FunctionTask
> Does this work for other people?  I could probably fix it by changing "add 
> jar" to download remote jars locally, when necessary (to load them into the 
> classpath), or update URLClassLoader (or whatever is underneath there) to 
> read directly from HDFS, which seems a bit more fragile.  But I wanted to 
> make sure that my interpretation of what's going on is right before I have at 
> it.
> Thanks,
> -- Philip
> {quote}
> {quote}
> Yes that's correct. I prefer to download the jars in "add jar".
> Zheng
> {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1157) UDFs can't be loaded via "add jar" when jar is on HDFS

2010-09-27 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1157:
-

Status: Patch Available  (was: Open)

> UDFs can't be loaded via "add jar" when jar is on HDFS
> --
>
> Key: HIVE-1157
> URL: https://issues.apache.org/jira/browse/HIVE-1157
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Philip Zeyliger
>Priority: Minor
> Attachments: hive-1157.patch.txt, HIVE-1157.patch.v3.txt, 
> HIVE-1157.patch.v4.txt, HIVE-1157.patch.v5.txt, HIVE-1157.v2.patch.txt, 
> output.txt
>
>
> As discussed on the mailing list, it would be nice if you could use UDFs that 
> are on jars on HDFS.  The proposed implementation would be for "add jar" to 
> recognize that the target file is on HDFS, copy it locally, and load it into 
> the classpath.
> {quote}
> Hi folks,
> I have a quick question about UDF support in Hive.  I'm on the 0.5 branch.  
> Can you use a UDF where the jar which contains the function is on HDFS, and 
> not on the local filesystem.  Specifically, the following does not seem to 
> work:
> # This is Hive 0.5, from svn
> $bin/hive  
> Hive history file=/tmp/philip/hive_job_log_philip_201002081541_370227273.txt
> hive> add jar hdfs://localhost/FooTest.jar;   
>
> Added hdfs://localhost/FooTest.jar to class path
> hive> create temporary function cube as 'com.cloudera.FooTestUDF';
> 
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.FunctionTask
> Does this work for other people?  I could probably fix it by changing "add 
> jar" to download remote jars locally, when necessary (to load them into the 
> classpath), or update URLClassLoader (or whatever is underneath there) to 
> read directly from HDFS, which seems a bit more fragile.  But I wanted to 
> make sure that my interpretation of what's going on is right before I have at 
> it.
> Thanks,
> -- Philip
> {quote}
> {quote}
> Yes that's correct. I prefer to download the jars in "add jar".
> Zheng
> {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1157) UDFs can't be loaded via "add jar" when jar is on HDFS

2010-09-27 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1157:
-

Attachment: HIVE-1157.patch.v5.txt

Attaching an updated version of Phil's patch that applies cleanly with -p0

> UDFs can't be loaded via "add jar" when jar is on HDFS
> --
>
> Key: HIVE-1157
> URL: https://issues.apache.org/jira/browse/HIVE-1157
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Philip Zeyliger
>Priority: Minor
> Attachments: hive-1157.patch.txt, HIVE-1157.patch.v3.txt, 
> HIVE-1157.patch.v4.txt, HIVE-1157.patch.v5.txt, HIVE-1157.v2.patch.txt, 
> output.txt
>
>
> As discussed on the mailing list, it would be nice if you could use UDFs that 
> are on jars on HDFS.  The proposed implementation would be for "add jar" to 
> recognize that the target file is on HDFS, copy it locally, and load it into 
> the classpath.
> {quote}
> Hi folks,
> I have a quick question about UDF support in Hive.  I'm on the 0.5 branch.  
> Can you use a UDF where the jar which contains the function is on HDFS, and 
> not on the local filesystem.  Specifically, the following does not seem to 
> work:
> # This is Hive 0.5, from svn
> $bin/hive  
> Hive history file=/tmp/philip/hive_job_log_philip_201002081541_370227273.txt
> hive> add jar hdfs://localhost/FooTest.jar;   
>
> Added hdfs://localhost/FooTest.jar to class path
> hive> create temporary function cube as 'com.cloudera.FooTestUDF';
> 
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.FunctionTask
> Does this work for other people?  I could probably fix it by changing "add 
> jar" to download remote jars locally, when necessary (to load them into the 
> classpath), or update URLClassLoader (or whatever is underneath there) to 
> read directly from HDFS, which seems a bit more fragile.  But I wanted to 
> make sure that my interpretation of what's going on is right before I have at 
> it.
> Thanks,
> -- Philip
> {quote}
> {quote}
> Yes that's correct. I prefer to download the jars in "add jar".
> Zheng
> {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1157) UDFs can't be loaded via "add jar" when jar is on HDFS

2010-09-27 Thread Philip Zeyliger (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Zeyliger updated HIVE-1157:
--

Attachment: HIVE-1157.patch.v4.txt

Carl,

I updated the patch to current trunk.

> UDFs can't be loaded via "add jar" when jar is on HDFS
> --
>
> Key: HIVE-1157
> URL: https://issues.apache.org/jira/browse/HIVE-1157
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Philip Zeyliger
>Priority: Minor
> Attachments: hive-1157.patch.txt, HIVE-1157.patch.v3.txt, 
> HIVE-1157.patch.v4.txt, HIVE-1157.v2.patch.txt, output.txt
>
>
> As discussed on the mailing list, it would be nice if you could use UDFs that 
> are on jars on HDFS.  The proposed implementation would be for "add jar" to 
> recognize that the target file is on HDFS, copy it locally, and load it into 
> the classpath.
> {quote}
> Hi folks,
> I have a quick question about UDF support in Hive.  I'm on the 0.5 branch.  
> Can you use a UDF where the jar which contains the function is on HDFS, and 
> not on the local filesystem.  Specifically, the following does not seem to 
> work:
> # This is Hive 0.5, from svn
> $bin/hive  
> Hive history file=/tmp/philip/hive_job_log_philip_201002081541_370227273.txt
> hive> add jar hdfs://localhost/FooTest.jar;   
>
> Added hdfs://localhost/FooTest.jar to class path
> hive> create temporary function cube as 'com.cloudera.FooTestUDF';
> 
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.FunctionTask
> Does this work for other people?  I could probably fix it by changing "add 
> jar" to download remote jars locally, when necessary (to load them into the 
> classpath), or update URLClassLoader (or whatever is underneath there) to 
> read directly from HDFS, which seems a bit more fragile.  But I wanted to 
> make sure that my interpretation of what's going on is right before I have at 
> it.
> Thanks,
> -- Philip
> {quote}
> {quote}
> Yes that's correct. I prefer to download the jars in "add jar".
> Zheng
> {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1157) UDFs can't be loaded via "add jar" when jar is on HDFS

2010-03-23 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1157:
--

Attachment: output.txt

> UDFs can't be loaded via "add jar" when jar is on HDFS
> --
>
> Key: HIVE-1157
> URL: https://issues.apache.org/jira/browse/HIVE-1157
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Philip Zeyliger
>Priority: Minor
> Attachments: hive-1157.patch.txt, HIVE-1157.patch.v3.txt, 
> HIVE-1157.v2.patch.txt, output.txt
>
>
> As discussed on the mailing list, it would be nice if you could use UDFs that 
> are on jars on HDFS.  The proposed implementation would be for "add jar" to 
> recognize that the target file is on HDFS, copy it locally, and load it into 
> the classpath.
> {quote}
> Hi folks,
> I have a quick question about UDF support in Hive.  I'm on the 0.5 branch.  
> Can you use a UDF where the jar which contains the function is on HDFS, and 
> not on the local filesystem.  Specifically, the following does not seem to 
> work:
> # This is Hive 0.5, from svn
> $bin/hive  
> Hive history file=/tmp/philip/hive_job_log_philip_201002081541_370227273.txt
> hive> add jar hdfs://localhost/FooTest.jar;   
>
> Added hdfs://localhost/FooTest.jar to class path
> hive> create temporary function cube as 'com.cloudera.FooTestUDF';
> 
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.FunctionTask
> Does this work for other people?  I could probably fix it by changing "add 
> jar" to download remote jars locally, when necessary (to load them into the 
> classpath), or update URLClassLoader (or whatever is underneath there) to 
> read directly from HDFS, which seems a bit more fragile.  But I wanted to 
> make sure that my interpretation of what's going on is right before I have at 
> it.
> Thanks,
> -- Philip
> {quote}
> {quote}
> Yes that's correct. I prefer to download the jars in "add jar".
> Zheng
> {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1157) UDFs can't be loaded via "add jar" when jar is on HDFS

2010-03-22 Thread Philip Zeyliger (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Zeyliger updated HIVE-1157:
--

Attachment: HIVE-1157.patch.v3.txt

Ed,

Indeed, I've been able to reproduce that.  I traced it down to some bad error 
handling when scratch_dir doesn't exist.  The new patch creates a scratch dir 
if it doesn't already exist, and adds an if/else to make sure 
localFile.delete() isn't called if localFile is null.

Sorry about that.  I'm not sure whether something changed between when I 
created the patch and now on trunk to change how the scratchdir works, or if I 
had the scratch dir craeted by other tests in my local checkout.  Either way, 
this should fix it.

Thanks!

> UDFs can't be loaded via "add jar" when jar is on HDFS
> --
>
> Key: HIVE-1157
> URL: https://issues.apache.org/jira/browse/HIVE-1157
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Philip Zeyliger
>Priority: Minor
> Attachments: hive-1157.patch.txt, HIVE-1157.patch.v3.txt, 
> HIVE-1157.v2.patch.txt
>
>
> As discussed on the mailing list, it would be nice if you could use UDFs that 
> are on jars on HDFS.  The proposed implementation would be for "add jar" to 
> recognize that the target file is on HDFS, copy it locally, and load it into 
> the classpath.
> {quote}
> Hi folks,
> I have a quick question about UDF support in Hive.  I'm on the 0.5 branch.  
> Can you use a UDF where the jar which contains the function is on HDFS, and 
> not on the local filesystem.  Specifically, the following does not seem to 
> work:
> # This is Hive 0.5, from svn
> $bin/hive  
> Hive history file=/tmp/philip/hive_job_log_philip_201002081541_370227273.txt
> hive> add jar hdfs://localhost/FooTest.jar;   
>
> Added hdfs://localhost/FooTest.jar to class path
> hive> create temporary function cube as 'com.cloudera.FooTestUDF';
> 
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.FunctionTask
> Does this work for other people?  I could probably fix it by changing "add 
> jar" to download remote jars locally, when necessary (to load them into the 
> classpath), or update URLClassLoader (or whatever is underneath there) to 
> read directly from HDFS, which seems a bit more fragile.  But I wanted to 
> make sure that my interpretation of what's going on is right before I have at 
> it.
> Thanks,
> -- Philip
> {quote}
> {quote}
> Yes that's correct. I prefer to download the jars in "add jar".
> Zheng
> {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1157) UDFs can't be loaded via "add jar" when jar is on HDFS

2010-03-16 Thread Philip Zeyliger (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Zeyliger updated HIVE-1157:
--

Attachment: HIVE-1157.v2.patch.txt

I've uploaded a new patch with a bug fix (wasn't unregistering the jars 
correctly) and with a test.

The test starts a MiniDFSCluster and runs add jar and delete jar explicitly, 
without using the ".q" framework.  I felt this was the best way to test just 
the new behavior.

> UDFs can't be loaded via "add jar" when jar is on HDFS
> --
>
> Key: HIVE-1157
> URL: https://issues.apache.org/jira/browse/HIVE-1157
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Philip Zeyliger
>Priority: Minor
> Attachments: hive-1157.patch.txt, HIVE-1157.v2.patch.txt
>
>
> As discussed on the mailing list, it would be nice if you could use UDFs that 
> are on jars on HDFS.  The proposed implementation would be for "add jar" to 
> recognize that the target file is on HDFS, copy it locally, and load it into 
> the classpath.
> {quote}
> Hi folks,
> I have a quick question about UDF support in Hive.  I'm on the 0.5 branch.  
> Can you use a UDF where the jar which contains the function is on HDFS, and 
> not on the local filesystem.  Specifically, the following does not seem to 
> work:
> # This is Hive 0.5, from svn
> $bin/hive  
> Hive history file=/tmp/philip/hive_job_log_philip_201002081541_370227273.txt
> hive> add jar hdfs://localhost/FooTest.jar;   
>
> Added hdfs://localhost/FooTest.jar to class path
> hive> create temporary function cube as 'com.cloudera.FooTestUDF';
> 
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.FunctionTask
> Does this work for other people?  I could probably fix it by changing "add 
> jar" to download remote jars locally, when necessary (to load them into the 
> classpath), or update URLClassLoader (or whatever is underneath there) to 
> read directly from HDFS, which seems a bit more fragile.  But I wanted to 
> make sure that my interpretation of what's going on is right before I have at 
> it.
> Thanks,
> -- Philip
> {quote}
> {quote}
> Yes that's correct. I prefer to download the jars in "add jar".
> Zheng
> {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1157) UDFs can't be loaded via "add jar" when jar is on HDFS

2010-03-02 Thread Philip Zeyliger (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Zeyliger updated HIVE-1157:
--

Attachment: hive-1157.patch.txt

This patch changes SessionState.java to copy jar resources locally, if they're 
not local already.

Because I had to manage additional per-resource state (namely, the location of 
the local copy, so that it can be cleaned up), I modified the ResourceType enum 
to be simply an enum, and now there is one ResourceHook object per resource, 
not per resource type.  I changed the container map to be an EnumMap.

It turns out that you can't specify an HDFS path to "-libjars", so I had to 
also modify ExecDriver.java to call a special method when it's getting jar 
resources.

I would appreciate some guidance on how to test this best.  So far, I've 
manually done the following steps:
{noformat}
create table t (x int);
# Create a file with "1\n2\n3\n" as /tmp/a.
load data local inpath '/tmp/a' into table t;
add jar hdfs://localhost:8020/Test.jar;
create temporary function cube as 'org.apache.hive.test.CubeSampleUDF';  # I 
wrote this
select cube(x) from t;
{noformat}
What else would it be reasonable for me to do?  It looks like there's no DFS in 
the test environment.  I might be able to register an ad-hoc file system 
implementation of some sort or use mockito or some such...  What do you 
recommend?

I'm running the existing tests to make sure that I haven't broken anything.  
These seem to take a while, so I'll report back.

> UDFs can't be loaded via "add jar" when jar is on HDFS
> --
>
> Key: HIVE-1157
> URL: https://issues.apache.org/jira/browse/HIVE-1157
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Philip Zeyliger
>Priority: Minor
> Attachments: hive-1157.patch.txt
>
>
> As discussed on the mailing list, it would be nice if you could use UDFs that 
> are on jars on HDFS.  The proposed implementation would be for "add jar" to 
> recognize that the target file is on HDFS, copy it locally, and load it into 
> the classpath.
> {quote}
> Hi folks,
> I have a quick question about UDF support in Hive.  I'm on the 0.5 branch.  
> Can you use a UDF where the jar which contains the function is on HDFS, and 
> not on the local filesystem.  Specifically, the following does not seem to 
> work:
> # This is Hive 0.5, from svn
> $bin/hive  
> Hive history file=/tmp/philip/hive_job_log_philip_201002081541_370227273.txt
> hive> add jar hdfs://localhost/FooTest.jar;   
>
> Added hdfs://localhost/FooTest.jar to class path
> hive> create temporary function cube as 'com.cloudera.FooTestUDF';
> 
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.FunctionTask
> Does this work for other people?  I could probably fix it by changing "add 
> jar" to download remote jars locally, when necessary (to load them into the 
> classpath), or update URLClassLoader (or whatever is underneath there) to 
> read directly from HDFS, which seems a bit more fragile.  But I wanted to 
> make sure that my interpretation of what's going on is right before I have at 
> it.
> Thanks,
> -- Philip
> {quote}
> {quote}
> Yes that's correct. I prefer to download the jars in "add jar".
> Zheng
> {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.