Re: Problem with DistributedCache after upgrading to CDH3b2

Kim Vogt Tue, 05 Oct 2010 16:40:44 -0700

Hey Jamie,

Thanks for the reply.  I asked about it the cloudera IRC so maybe they'll
look into it, in the meantime, I'm going to go ahead and copy that file over
to my datanodes :-)


-Kim

On Tue, Oct 5, 2010 at 2:54 PM, Jamie Cockrill <jamie.cockr...@gmail.com>wrote:

> Hi Kim,
>
> We didn't fix it in the end. I just ended up manually writing the
> files to the cluster using the FileSystem class, and then reading them
> back out again on the other side. Not terribly efficient as I guess
> the point of DistributedCache is that the files get distributed to
> every node, whereas I'm only writing to two or three nodes, then every
> map-task is then trying to read back from those two or three nodes the
> data are stored on.
>
> Unfortunately I didn't have the will or inclination to investigate it
> any further as I had some pretty tight deadlines to keep to and it
> hasn't caused me any significant problems yet...
>
> Thanks,
>
> Jamie
>
> On 5 October 2010 22:30, Kim Vogt <k...@simplegeo.com> wrote:
> > I'm experiencing the same problem.  I was hoping there were be a reply to
> > this.  Anyone? Bueller?
> >
> > -Kim
> >
> > On Fri, Jul 16, 2010 at 1:58 AM, Jamie Cockrill <
> jamie.cockr...@gmail.com>wrote:
> >
> >> Dear All,
> >>
> >> We recently upgraded from CDH3b1 to b2 and ever since, all our
> >> mapreduce jobs that use the DistributedCache have failed. Typically,
> >> we add files to the cache prior to job startup, using
> >> addCacheFile(URI, conf) and then get them on the other side, using
> >> getLocalCacheFiles(conf). I believe the hadoop-core versions for these
> >> are 0.20.2+228 and +320 respectively.
> >>
> >> We then open the files and read them in using a standard FileReader,
> >> using the toString on the path object as the constructor parameter,
> >> which has worked fine up to now. However, we're now getting
> >> FileNotFound exceptions when the file reader tries to open the file.
> >>
> >> Unfortunately the cluster is on an airgapped network, but the
> >> FileNotFound line comes out like:
> >>
> >> java.io.FileNotFoundException:
> >>
> >>
> /tmp/hadoop-hadoop/mapred/local/taskTracker/archive/master/path/to/my/file/filename.txt/filename.txt
> >>
> >> Note, the duplication of filename.txt is deliberate. I'm not sure if
> >> that's strange or not as this has previously worked absolutely fine.
> >> Has anyone else experienced this? Apologies if this is known, I've
> >> only just joined the list.
> >>
> >> Many thanks,
> >>
> >> Jamie
> >>
> >
>

Re: Problem with DistributedCache after upgrading to CDH3b2

Reply via email to