In the current code, details about block locations of a file are
cached on the client when the file is opened. This cache remains with
the client until the file is closed. If the same file is re-opened by
the same DFSClient, it re-contacts the namenode and refetches the
block locations. This works ok for most map-reduce apps because it is
rare that the same DSClient re-opens the same file again.

Can you pl explain your use-case?

thanks,
dhruba


On Sun, Nov 2, 2008 at 10:57 PM, Taeho Kang <[EMAIL PROTECTED]> wrote:
> Dear Hadoop Users and Developers,
>
> I was wondering if there's a plan to add "file info cache" in DFSClient?
>
> It could eliminate network travelling cost for contacting Namenode and I
> think it would greatly improve the DFSClient's performance.
> The code I was looking at was this
>
> -----------------------
> DFSClient.java
>
>    /**
>     * Grab the open-file info from namenode
>     */
>    synchronized void openInfo() throws IOException {
>      /* Maybe, we could add a file info cache here! */
>      LocatedBlocks newInfo = callGetBlockLocations(src, 0, prefetchSize);
>      if (newInfo == null) {
>        throw new IOException("Cannot open filename " + src);
>      }
>      if (locatedBlocks != null) {
>        Iterator<LocatedBlock> oldIter =
> locatedBlocks.getLocatedBlocks().iterator();
>        Iterator<LocatedBlock> newIter =
> newInfo.getLocatedBlocks().iterator();
>        while (oldIter.hasNext() && newIter.hasNext()) {
>          if (! oldIter.next().getBlock().equals(newIter.next().getBlock()))
> {
>            throw new IOException("Blocklist for " + src + " has changed!");
>          }
>        }
>      }
>      this.locatedBlocks = newInfo;
>      this.currentNode = null;
>    }
> -----------------------
>
> Does anybody have an opinion on this matter?
>
> Thank you in advance,
>
> Taeho
>

Reply via email to