Hi Edward, I was able to use the distributed cache, using the set mapred.cache.files option. I could read the files locally using standard java api's.
Thanks Viraj ________________________________ From: Edward Capriolo [mailto:[email protected]] Sent: Tuesday, June 22, 2010 7:44 AM To: [email protected] Subject: Re: Using Distributed Cache in Hive UDF's?? Shameless plug. IF you put a file in the distributed cache it is in the working directory of the UDF so you do not need fancy hadoop isms to access it. Shameless plug: My geo-ip-udf does exactly this. http://www.jointhegrid.com/hive-udf-geo-ip-jtg/index.jsp http://www.jointhegrid.com/svn/hive-udf-geo-ip-jtg/ Edward On Mon, Jun 21, 2010 at 7:03 PM, Viraj Bhat <[email protected]> wrote: Hi all, I have a lookup function in hive which looks if a certain pattern is present in a large text file. I upload this text file to HDFS. I hope to use this text file in my UDF evaluate() method. Is there some documentation I can look up? Distributed Cache relies on lookupFiles = DistributedCache.getLocalCacheFiles(job); job is of type JobConf. Where do I get the JobConf object from within the UDF? Thanks Viraj
