Re: Is it possible to run multiple MapReduce against the same HDFS?

Zhenhua (Gerald) Guo Mon, 10 Oct 2011 19:57:07 -0700

Thanks, Robert.  I will look into hod.

When MapReduce framework accesses data stored in HDFS, which account
is used, the account which MapReduce daemons (e.g. job tracker) run as
or the account of the user who submits the job?  If HDFS and MapReduce
clusters are run with different accounts, can MapReduce cluster be
able to access HDFS directories and files (if authentication in HDFS
is enabled)?


Thanks!

Gerald

On Mon, Oct 10, 2011 at 12:36 PM, Robert Evans <ev...@yahoo-inc.com> wrote:
> It should be possible to use multiple map/reduce clusters sharing the same 
> HDFS, you can look at hod where it launches a JT on demand.  The only change 
> of collision that I can think of would be if by some odd chance both Job 
> Trackers were started at exactly the same millisecond.   The JT uses the time 
> it was started as part of the job id for all jobs.  Those job ids are assumed 
> to be unique and used to create files/directories in HDFS to store data for 
> that job.
>
> --Bobby Evans
>
> On 10/7/11 12:09 PM, "Zhenhua (Gerald) Guo" <jen...@gmail.com> wrote:
>
> I plan to deploy a HDFS cluster which will be shared by multiple
> MapReduce clusters.
> I wonder whether this is possible.  Will it incur any conflicts among
> MapReduce (e.g. different MapReduce clusters try to use the same temp
> directory in HDFS)?
> If it is possible, how should the security parameters be set up (e.g.
> user identity, file permission)?
>
> Thanks,
>
> Gerald
>
>

Re: Is it possible to run multiple MapReduce against the same HDFS?

Reply via email to