Re: Question on opening file info from namenode in DFSClient

Mridul Muralidharan Mon, 03 Nov 2008 03:52:55 -0800

Consider case of file getting removed & recreated with same name ..while there is cached info about the file (and job is running) in theDFSClient's (mapper/reducer).


- Mridul


Taeho Kang wrote:

Dear Hadoop Users and Developers,

I was wondering if there's a plan to add "file info cache" in DFSClient?

It could eliminate network travelling cost for contacting Namenode and I
think it would greatly improve the DFSClient's performance.
The code I was looking at was this

-----------------------
DFSClient.java

    /**
     * Grab the open-file info from namenode
     */
    synchronized void openInfo() throws IOException {
      /* Maybe, we could add a file info cache here! */
      LocatedBlocks newInfo = callGetBlockLocations(src, 0, prefetchSize);
      if (newInfo == null) {
        throw new IOException("Cannot open filename " + src);
      }
      if (locatedBlocks != null) {
        Iterator<LocatedBlock> oldIter =
locatedBlocks.getLocatedBlocks().iterator();
        Iterator<LocatedBlock> newIter =
newInfo.getLocatedBlocks().iterator();
        while (oldIter.hasNext() && newIter.hasNext()) {
          if (! oldIter.next().getBlock().equals(newIter.next().getBlock()))
{
            throw new IOException("Blocklist for " + src + " has changed!");
          }
        }
      }
      this.locatedBlocks = newInfo;
      this.currentNode = null;
    }
-----------------------

Does anybody have an opinion on this matter?

Thank you in advance,

Taeho

Re: Question on opening file info from namenode in DFSClient

Reply via email to