Hey Jamie, Thanks for the reply. I asked about it the cloudera IRC so maybe they'll look into it, in the meantime, I'm going to go ahead and copy that file over to my datanodes :-)
-Kim On Tue, Oct 5, 2010 at 2:54 PM, Jamie Cockrill <jamie.cockr...@gmail.com>wrote: > Hi Kim, > > We didn't fix it in the end. I just ended up manually writing the > files to the cluster using the FileSystem class, and then reading them > back out again on the other side. Not terribly efficient as I guess > the point of DistributedCache is that the files get distributed to > every node, whereas I'm only writing to two or three nodes, then every > map-task is then trying to read back from those two or three nodes the > data are stored on. > > Unfortunately I didn't have the will or inclination to investigate it > any further as I had some pretty tight deadlines to keep to and it > hasn't caused me any significant problems yet... > > Thanks, > > Jamie > > On 5 October 2010 22:30, Kim Vogt <k...@simplegeo.com> wrote: > > I'm experiencing the same problem. I was hoping there were be a reply to > > this. Anyone? Bueller? > > > > -Kim > > > > On Fri, Jul 16, 2010 at 1:58 AM, Jamie Cockrill < > jamie.cockr...@gmail.com>wrote: > > > >> Dear All, > >> > >> We recently upgraded from CDH3b1 to b2 and ever since, all our > >> mapreduce jobs that use the DistributedCache have failed. Typically, > >> we add files to the cache prior to job startup, using > >> addCacheFile(URI, conf) and then get them on the other side, using > >> getLocalCacheFiles(conf). I believe the hadoop-core versions for these > >> are 0.20.2+228 and +320 respectively. > >> > >> We then open the files and read them in using a standard FileReader, > >> using the toString on the path object as the constructor parameter, > >> which has worked fine up to now. However, we're now getting > >> FileNotFound exceptions when the file reader tries to open the file. > >> > >> Unfortunately the cluster is on an airgapped network, but the > >> FileNotFound line comes out like: > >> > >> java.io.FileNotFoundException: > >> > >> > /tmp/hadoop-hadoop/mapred/local/taskTracker/archive/master/path/to/my/file/filename.txt/filename.txt > >> > >> Note, the duplication of filename.txt is deliberate. I'm not sure if > >> that's strange or not as this has previously worked absolutely fine. > >> Has anyone else experienced this? Apologies if this is known, I've > >> only just joined the list. > >> > >> Many thanks, > >> > >> Jamie > >> > > >