Re: Hadoop-on-demand and torque

Brian Bockelman Sun, 20 May 2012 20:11:10 -0700

Hi Ralph,

I admit - I've only been half-following the OpenMPI progress.  Do you have a 
technical write-up of what has been done?


Thanks,

Brian

On May 20, 2012, at 9:31 AM, Ralph Castain wrote:

> FWIW: Open MPI now has an initial cut at "MR+" that runs map-reduce under any 
> HPC environment. We don't have the Java integration yet to support the Hadoop 
> MR class, but you can write a mapper/reducer and execute that programming 
> paradigm. We plan to integrate the Hadoop MR class soon.
> 
> If you already have that integration, we'd love to help port it over. We 
> already have the MPI support completed, so any mapper/reducer could use it.
> 
> 
> On May 20, 2012, at 7:12 AM, Pierre Antoine DuBoDeNa wrote:
> 
>> We run similar infrastructure in a university project.. we plan to install
>> hadoop.. and looking for "alternatives" based on hadoop in case the pure
>> hadoop is not working as expected.
>> 
>> Keep us updated on the code release.
>> 
>> Best,
>> PA
>> 
>> 2012/5/20 Stijn De Weirdt <stijn.dewei...@ugent.be>
>> 
>>> hi all,
>>> 
>>> i'm part of an HPC group of a university, and we have some users that are
>>> interested in Hadoop to see if it can be useful in their research and we
>>> also have researchers that are using hadoop already on their own
>>> infrastructure, but that is is not enough reason for us to start with
>>> dedicated dedicated Hadoop infrastructure  (we are now only running torque
>>> based clusters with and without shared storage; setting up and properly
>>> maintaining Hadoop infrastructure requires quite some understanding of new
>>> software)
>>> 
>>> to be able to support these needs we wanted to do just this: use current
>>> HPC infrastructure to make private hadoop clusters so people can do some
>>> work. if we attract enough interest, we will probably setup dedicated
>>> infrastructure, but by that time we (the admins) will also have a better
>>> understanding of what is required.
>>> 
>>> so we used to look at HOD for testing/running hadoop on existing
>>> infrastructure (never really looked at myhadoop though).
>>> but (imho) the current HOD code base is not in such a good state. we did
>>> some work to get it working and added some features, to come to the
>>> conclusion that it was not sufficient (and not maintainable).
>>> 
>>> so we wrote something from scratch with same functionality as HOD, and
>>> much more (eg HBase is now possible, with or without MR1; some default
>>> tuning; easy to add support for yarn instead of MR1).
>>> it has some suport for torque, but my laptop is also sufficient. (the
>>> torque support is a wrapper to submit the job)
>>> we gave a workshop on hadoop using it (25 people, and each with their own
>>> 5 node hadoop cluster) and it went rather well.
>>> 
>>> it's not in a public repo yet, but we could do that. if interested, let me
>>> know, and i see what can be done. (releasing the code is on our todo list,
>>> but if there is some demand, we can do it sooner)
>>> 
>>> 
>>> stijn
>>> 
>>> 
>>> 
>>> On 05/18/2012 05:07 PM, Pierre Antoine DuBoDeNa wrote:
>>> 
>>>> I am also interested to learn about myHadoop as I use a shared storage
>>>> system and everything runs on VMs and not actual dedicated servers.
>>>> 
>>>> in like amazon EC2 environment which you just have VMs and huge central
>>>> storage, is it any helpful to use hadoop to distribute jobs and maybe
>>>> parallelize algorithms, or is better to go with other technologies?
>>>> 
>>>> 2012/5/18 Manu S<manupk...@gmail.com>
>>>> 
>>>> Hi All,
>>>>> 
>>>>> Guess HOD could be useful existing HPC cluster with Torque scheduler
>>>>> which
>>>>> needs to run map-reduce jobs.
>>>>> 
>>>>> Also read about *myHadoop- Hadoop on demand on traditional HPC
>>>>> resources*will support many HPC schedulers like SGE, PBS etc to over
>>>>> come the
>>>>> integration of shared-architecture(HPC)&  shared-nothing
>>>>> 
>>>>> architecture(Hadoop).
>>>>> 
>>>>> Any real use case scenarios for integrating hadoop map/reduce in existing
>>>>> HPC cluster and what are the advantages of using hadoop features in HPC
>>>>> cluster?
>>>>> 
>>>>> Appreciate your comments on the same.
>>>>> 
>>>>> Thanks,
>>>>> Manu S
>>>>> 
>>>>> 
>>>>> 
>>>>> On Fri, May 18, 2012 at 12:41 AM, Merto Mertek<masmer...@gmail.com>
>>>>> wrote:
>>>>> 
>>>>> If I understand it right HOD is mentioned mainly for merging existing
>>>>>> HPC
>>>>>> clusters with hadoop and for testing purposes..
>>>>>> 
>>>>>> I cannot find what is the role of Torque here (just initial nodes
>>>>>> allocation?) and which is the default scheduler of HOD ?  Probably the
>>>>>> scheduler from the hadoop distribution?
>>>>>> 
>>>>>> In the doc is mentioned a MAUI scheduler, but probably if there would be
>>>>>> 
>>>>> an
>>>>> 
>>>>>> integration with hadoop there will be any document on it..
>>>>>> 
>>>>>> thanks..
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>>

Re: Hadoop-on-demand and torque

Reply via email to