Re: Using addCacheArchive

akhil1988 Fri, 26 Jun 2009 17:56:13 -0700

Thanks Chris for your reply!

Well, I could not understand much of what has been discussed on that forum.
I am unaware of Cascading.


My problem is simple - I want a directory to present in the local working
directory of tasks so that I can access it from my map task in the following
manner :

FileInputStream fin = new FileInputStream("Config/file1.config"); 

where,
Config is a directory which contains many files/directories, one of which is
file1.config

It would be helpful to me if you can tell me what statements to use to
distribute a directory to the tasktrackers.
The API doc http://hadoop.apache.org/core/docs/r0.20.0/api/index.html says
that archives are unzipped on the tasktrackers but I want an example of how
to use this in case of a dreictory.

Thanks,
Akhil



Chris Curtin-2 wrote:
> 
> Hi,
> 
> I've found it much easier to write the file to HDFS use the API, then pass
> the 'path' to the file in HDFS as a property. You'll need to remember to
> clean up the file after you're done with it.
> 
> Example details are in this thread:
> http://groups.google.com/group/cascading-user/browse_thread/thread/d5c619349562a8d6#
> 
> Hope this helps,
> 
> Chris
> 
> On Thu, Jun 25, 2009 at 4:50 PM, akhil1988 <akhilan...@gmail.com> wrote:
> 
>>
>> Please ask any questions if I am not clear above about the problem I am
>> facing.
>>
>> Thanks,
>> Akhil
>>
>> akhil1988 wrote:
>> >
>> > Hi All!
>> >
>> > I want a directory to be present in the local working directory of the
>> > task for which I am using the following statements:
>> >
>> > DistributedCache.addCacheArchive(new URI("/home/akhil1988/Config.zip"),
>> > conf);
>> > DistributedCache.createSymlink(conf);
>> >
>> >>> Here Config is a directory which I have zipped and put at the given
>> >>> location in HDFS
>> >
>> > I have zipped the directory because the API doc of DistributedCache
>> > (http://hadoop.apache.org/core/docs/r0.20.0/api/index.html) says that
>> the
>> > archive files are unzipped in the local cache directory :
>> >
>> > DistributedCache can be used to distribute simple, read-only data/text
>> > files and/or more complex types such as archives, jars etc. Archives
>> (zip,
>> > tar and tgz/tar.gz files) are un-archived at the slave nodes.
>> >
>> > So, from my understanding of the API docs I expect that the Config.zip
>> > file will be unzipped to Config directory and since I have SymLinked
>> them
>> > I can access the directory in the following manner from my map
>> function:
>> >
>> > FileInputStream fin = new FileInputStream("Config/file1.config");
>> >
>> > But I get the FileNotFoundException on the execution of this statement.
>> > Please let me know where I am going wrong.
>> >
>> > Thanks,
>> > Akhil
>> >
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Using-addCacheArchive-tp24207739p24210836.html
>> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Using-addCacheArchive-tp24207739p24229338.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.

Re: Using addCacheArchive

Reply via email to