Hi, > I am actually doing some test to see the performance. I want to eliminate the > interference of distributed cache. I find there is method in the api to purge > the cache. That might be what I want.
So, you want to run multiple versions of a job (possibly different job parameters) and measure them relatively. Is that correct ? I can think of some options: - Is it possible, not to use distributed cache at all ? You could possibly bundle the files along with the job jar. - You could run the job on fresh cluster instances (a more costly option, nevertheless) - You could change the timestamps of the distributed cache files on DFS somehow before each invocation of the job. This will make Hadoop believe that the files have been changed, and this will cause distributed cache to fetch the files again. The purgeCache API you are seeing is very mapreduce framework specific. This is *not* to be used by client code, and is not guaranteed to work. In the latter versions of Hadoop (0.21 and trunk), these methods have been deprecated in the public API and will be removed altogether. Thanks hemanth > > Thanks, > -Gang > > > > ----- 原始邮件 ---- > 发件人: Hemanth Yamijala <yhema...@gmail.com> > 收件人: common-user@hadoop.apache.org > 发送日期: 2010/8/2 (周一) 12:56:25 上午 > 主 题: Re: reuse cached files > > Hi, > >> Thanks Hemanth. Is there any way to invalidate the reuse and ask Hadoop to >> resent exactly the same files to cache for every job? > > I may be able to answer this better if I understand the use case. If > you need the same files for every job, why would you need to send them > afresh each time ? If something is cached, it can be reused, no ? I am > sure I must be missing something in your requirement ... > > Thanks > Hemanth > > > > >