Hi Taeho,
Thanks for ur explanation. If your application opens a dfs file and
does not close it, then the dfsclient will automatcally keep block
locations cached. So, you could achieve your desired goal by
developing a cache layer (above HDFS) that does not close the hdfs
file even if the user has closed it. This cache-layer needs to manage
this cache-pol of HDFS fle handles.
does this help?
thanks,
dhruba
On Fri, Nov 7, 2008 at 12:53 AM, Taeho Kang <[EMAIL PROTECTED]> wrote:
> Hi, thanks for your reply Dhruba,
>
> One of my co-workers is writing a BigTable-like application that could be
> used for online, near-real-time, services. So since the application could be
> hooked into online services, there would times when a large number of users
> (e.g. 1000 users) request to access few files in a very short time.
>
> Of course, in a batch process job, this is a rare case, but for online
> services, it's quite a common case.
> I think HBase developers would have run into similar issues as well.
>
> Is this enough explanation?
>
> Thanks in advance,
>
> Taeho
>
>
>
> On Tue, Nov 4, 2008 at 3:12 AM, Dhruba Borthakur <[EMAIL PROTECTED]> wrote:
>
>> In the current code, details about block locations of a file are
>> cached on the client when the file is opened. This cache remains with
>> the client until the file is closed. If the same file is re-opened by
>> the same DFSClient, it re-contacts the namenode and refetches the
>> block locations. This works ok for most map-reduce apps because it is
>> rare that the same DSClient re-opens the same file again.
>>
>> Can you pl explain your use-case?
>>
>> thanks,
>> dhruba
>>
>>
>> On Sun, Nov 2, 2008 at 10:57 PM, Taeho Kang <[EMAIL PROTECTED]> wrote:
>> > Dear Hadoop Users and Developers,
>> >
>> > I was wondering if there's a plan to add "file info cache" in DFSClient?
>> >
>> > It could eliminate network travelling cost for contacting Namenode and I
>> > think it would greatly improve the DFSClient's performance.
>> > The code I was looking at was this
>> >
>> > -----------------------
>> > DFSClient.java
>> >
>> > /**
>> > * Grab the open-file info from namenode
>> > */
>> > synchronized void openInfo() throws IOException {
>> > /* Maybe, we could add a file info cache here! */
>> > LocatedBlocks newInfo = callGetBlockLocations(src, 0, prefetchSize);
>> > if (newInfo == null) {
>> > throw new IOException("Cannot open filename " + src);
>> > }
>> > if (locatedBlocks != null) {
>> > Iterator<LocatedBlock> oldIter =
>> > locatedBlocks.getLocatedBlocks().iterator();
>> > Iterator<LocatedBlock> newIter =
>> > newInfo.getLocatedBlocks().iterator();
>> > while (oldIter.hasNext() && newIter.hasNext()) {
>> > if (!
>> oldIter.next().getBlock().equals(newIter.next().getBlock()))
>> > {
>> > throw new IOException("Blocklist for " + src + " has
>> changed!");
>> > }
>> > }
>> > }
>> > this.locatedBlocks = newInfo;
>> > this.currentNode = null;
>> > }
>> > -----------------------
>> >
>> > Does anybody have an opinion on this matter?
>> >
>> > Thank you in advance,
>> >
>> > Taeho
>> >
>>
>