when you use -files option, it copies in a .staging directory and all
mappers can access it but for output format, I see it is not able to access
it.

-files copies cache file under:

/user/<id>/.staging/<job name>/files/<filename>


On Fri, Jul 29, 2011 at 11:14 AM, Alejandro Abdelnur <t...@cloudera.com>wrote:

> Mmmh, I've never used the -files option (I don't know if it will copy the
> files to HDFS for your or you have to put them there first).
>
> My usage pattern of the DC is copying the files to HDFS, then use the DC
> API to add those files to the jobconf.
>
> Alejandro
>
>
> On Fri, Jul 29, 2011 at 10:56 AM, Mapred Learn <mapred.le...@gmail.com>wrote:
>
>> i m trying to access file that I sent as -files option in my hadoop jar
>> command.
>>
>> in my outputformat,
>> I am doing something like:
>>
>> Path[] cacheFiles = DistributedCache.getLocalCacheFiles(conf);
>>
>>         String file1="";
>>         String file2="";
>>         Path pt=null;
>>
>>         for (Path p : cacheFiles) {
>>
>>             if (p != null) {
>>                 if (p.getName().endsWith(".ryp")) {
>>                     file1 = p.getName();
>>                 } else if (p.getName().endsWith(".cpt")) {
>>                     file2 = p.getName();
>>                     pt=p;
>>                 }
>>
>>             }
>>
>>         }
>>
>> // then read the file, which gives file does not exist exception:
>>
>> Path pat = new Path(file2);
>>
>>         BufferedReader reader = null;
>>         try {
>>             FileSystem fs = FileSystem.get(conf);
>>             reader=new BufferedReader(
>>                     new InputStreamReader(fs.open(pat)));
>>
>>
>>             String line = null;
>>             while ((line = reader.readLine()) != null) {
>>                 System.out.println("Now parsing the line: " + line);
>>
>>
>>             }
>>         } catch (Exception e) {
>>             System.out.println("exception" + e.getMessage());
>>
>>         }
>>
>> On Fri, Jul 29, 2011 at 10:50 AM, Alejandro Abdelnur 
>> <t...@cloudera.com>wrote:
>>
>>> Where are you getting the error, in the client submitting the job or in
>>> the MR tasks?
>>>
>>> Are you trying to access a file or trying to set a JAR in the
>>> DistributedCache?
>>> How/when are you adding the file/JAR to the DC?
>>> How are you retrieving the file/JAR from your outputformat code?
>>>
>>> Thxs.
>>>
>>> Alejandro
>>>
>>>
>>> On Fri, Jul 29, 2011 at 10:43 AM, Mapred Learn 
>>> <mapred.le...@gmail.com>wrote:
>>>
>>>> I am trying to create a custom text outputformat where I want to access
>>>> a distirbuted cache file.
>>>>
>>>>
>>>>
>>>> On Fri, Jul 29, 2011 at 10:42 AM, Harsh J <ha...@cloudera.com> wrote:
>>>>
>>>>> Mapred,
>>>>>
>>>>> By outputformat, do you mean the frontend, submit-time run of
>>>>> OutputFormat? Then no, it cannot access the distributed cache cause
>>>>> its not really setup at that point, and the front end doesn't need the
>>>>> distributed cache really when it can access those files directly.
>>>>>
>>>>> Could you describe slightly deeper on what you're attempting to do?
>>>>>
>>>>> On Fri, Jul 29, 2011 at 10:57 PM, Mapred Learn <mapred.le...@gmail.com>
>>>>> wrote:
>>>>> > Hi,
>>>>> > I am trying to access distributed cache in my custom output format
>>>>> but it
>>>>> > does not work and file open in custom output format fails with file
>>>>> does not
>>>>> > exist even though it physically does.
>>>>> >
>>>>> > Looks like distributed cache only works for Mappers and Reducers ?
>>>>> >
>>>>> > Is there a way I can read Distributed Cache in my custom output
>>>>> format ?
>>>>> >
>>>>> > Thanks,
>>>>> > -JJ
>>>>> >
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Harsh J
>>>>>
>>>>
>>>>
>>>
>>
>

Reply via email to