In the current code, details about block locations of a file are
cached on the client when the file is opened. This cache remains with
the client until the file is closed. If the same file is re-opened by
the same DFSClient, it re-contacts the namenode and refetches the
block locations. This works ok for most map-reduce apps because it is
rare that the same DSClient re-opens the same file again.
Can you pl explain your use-case?
thanks,
dhruba
On Sun, Nov 2, 2008 at 10:57 PM, Taeho Kang <[EMAIL PROTECTED]> wrote:
> Dear Hadoop Users and Developers,
>
> I was wondering if there's a plan to add "file info cache" in DFSClient?
>
> It could eliminate network travelling cost for contacting Namenode and I
> think it would greatly improve the DFSClient's performance.
> The code I was looking at was this
>
> -----------------------
> DFSClient.java
>
> /**
> * Grab the open-file info from namenode
> */
> synchronized void openInfo() throws IOException {
> /* Maybe, we could add a file info cache here! */
> LocatedBlocks newInfo = callGetBlockLocations(src, 0, prefetchSize);
> if (newInfo == null) {
> throw new IOException("Cannot open filename " + src);
> }
> if (locatedBlocks != null) {
> Iterator<LocatedBlock> oldIter =
> locatedBlocks.getLocatedBlocks().iterator();
> Iterator<LocatedBlock> newIter =
> newInfo.getLocatedBlocks().iterator();
> while (oldIter.hasNext() && newIter.hasNext()) {
> if (! oldIter.next().getBlock().equals(newIter.next().getBlock()))
> {
> throw new IOException("Blocklist for " + src + " has changed!");
> }
> }
> }
> this.locatedBlocks = newInfo;
> this.currentNode = null;
> }
> -----------------------
>
> Does anybody have an opinion on this matter?
>
> Thank you in advance,
>
> Taeho
>