when you use -files option, it copies in a .staging directory and all mappers can access it but for output format, I see it is not able to access it.
-files copies cache file under: /user/<id>/.staging/<job name>/files/<filename> On Fri, Jul 29, 2011 at 11:14 AM, Alejandro Abdelnur <t...@cloudera.com>wrote: > Mmmh, I've never used the -files option (I don't know if it will copy the > files to HDFS for your or you have to put them there first). > > My usage pattern of the DC is copying the files to HDFS, then use the DC > API to add those files to the jobconf. > > Alejandro > > > On Fri, Jul 29, 2011 at 10:56 AM, Mapred Learn <mapred.le...@gmail.com>wrote: > >> i m trying to access file that I sent as -files option in my hadoop jar >> command. >> >> in my outputformat, >> I am doing something like: >> >> Path[] cacheFiles = DistributedCache.getLocalCacheFiles(conf); >> >> String file1=""; >> String file2=""; >> Path pt=null; >> >> for (Path p : cacheFiles) { >> >> if (p != null) { >> if (p.getName().endsWith(".ryp")) { >> file1 = p.getName(); >> } else if (p.getName().endsWith(".cpt")) { >> file2 = p.getName(); >> pt=p; >> } >> >> } >> >> } >> >> // then read the file, which gives file does not exist exception: >> >> Path pat = new Path(file2); >> >> BufferedReader reader = null; >> try { >> FileSystem fs = FileSystem.get(conf); >> reader=new BufferedReader( >> new InputStreamReader(fs.open(pat))); >> >> >> String line = null; >> while ((line = reader.readLine()) != null) { >> System.out.println("Now parsing the line: " + line); >> >> >> } >> } catch (Exception e) { >> System.out.println("exception" + e.getMessage()); >> >> } >> >> On Fri, Jul 29, 2011 at 10:50 AM, Alejandro Abdelnur >> <t...@cloudera.com>wrote: >> >>> Where are you getting the error, in the client submitting the job or in >>> the MR tasks? >>> >>> Are you trying to access a file or trying to set a JAR in the >>> DistributedCache? >>> How/when are you adding the file/JAR to the DC? >>> How are you retrieving the file/JAR from your outputformat code? >>> >>> Thxs. >>> >>> Alejandro >>> >>> >>> On Fri, Jul 29, 2011 at 10:43 AM, Mapred Learn >>> <mapred.le...@gmail.com>wrote: >>> >>>> I am trying to create a custom text outputformat where I want to access >>>> a distirbuted cache file. >>>> >>>> >>>> >>>> On Fri, Jul 29, 2011 at 10:42 AM, Harsh J <ha...@cloudera.com> wrote: >>>> >>>>> Mapred, >>>>> >>>>> By outputformat, do you mean the frontend, submit-time run of >>>>> OutputFormat? Then no, it cannot access the distributed cache cause >>>>> its not really setup at that point, and the front end doesn't need the >>>>> distributed cache really when it can access those files directly. >>>>> >>>>> Could you describe slightly deeper on what you're attempting to do? >>>>> >>>>> On Fri, Jul 29, 2011 at 10:57 PM, Mapred Learn <mapred.le...@gmail.com> >>>>> wrote: >>>>> > Hi, >>>>> > I am trying to access distributed cache in my custom output format >>>>> but it >>>>> > does not work and file open in custom output format fails with file >>>>> does not >>>>> > exist even though it physically does. >>>>> > >>>>> > Looks like distributed cache only works for Mappers and Reducers ? >>>>> > >>>>> > Is there a way I can read Distributed Cache in my custom output >>>>> format ? >>>>> > >>>>> > Thanks, >>>>> > -JJ >>>>> > >>>>> >>>>> >>>>> >>>>> -- >>>>> Harsh J >>>>> >>>> >>>> >>> >> >