Consider case of file getting removed & recreated with same name ..
while there is cached info about the file (and job is running) in the
DFSClient's (mapper/reducer).
- Mridul
Taeho Kang wrote:
Dear Hadoop Users and Developers,
I was wondering if there's a plan to add "file info cache" in DFSClient?
It could eliminate network travelling cost for contacting Namenode and I
think it would greatly improve the DFSClient's performance.
The code I was looking at was this
-----------------------
DFSClient.java
/**
* Grab the open-file info from namenode
*/
synchronized void openInfo() throws IOException {
/* Maybe, we could add a file info cache here! */
LocatedBlocks newInfo = callGetBlockLocations(src, 0, prefetchSize);
if (newInfo == null) {
throw new IOException("Cannot open filename " + src);
}
if (locatedBlocks != null) {
Iterator<LocatedBlock> oldIter =
locatedBlocks.getLocatedBlocks().iterator();
Iterator<LocatedBlock> newIter =
newInfo.getLocatedBlocks().iterator();
while (oldIter.hasNext() && newIter.hasNext()) {
if (! oldIter.next().getBlock().equals(newIter.next().getBlock()))
{
throw new IOException("Blocklist for " + src + " has changed!");
}
}
}
this.locatedBlocks = newInfo;
this.currentNode = null;
}
-----------------------
Does anybody have an opinion on this matter?
Thank you in advance,
Taeho