Taking it to dev@ to see if there is any interest.
It is a good and interesting requirement. I can see hacking 'pre-setup' storage with tags to achieve this, but it is going to be a fragile hack. I believe GCE also has the concept of some instance types having dedicated spindles. On 6/6/13 11:14 AM, "David Ortiz" <dpor...@outlook.com> wrote: >Chiradeep, > Currently I am working with KVM hypervisor nodes. The use case of >having 4 spindles and assigning one to each node is exactly what I would >like to do. For the moment I have all four spindles configured in a RAID >with the cloudstack local storage pointed at it. >Shanker, > I had not seen that slideshow yet, so thank you for pointing me to >it. As of now, the hadoop resources I am using are statically allocated >between 4 hosts. As it stands now, I am constrained to those resources >without the ability to add any additional storage cluster (or additional >storage to my current shared storage appliance), or additional nodes. >Fortunately, my use cases don't require any kind of reallocation of the >hadoop nodes. It's more clients for the cluster as well as web service >nodes that run clients that are being dynamically spun up and down. I >have found that I can get through my jobs alright, they just take a lot >of extra time to run since I have the storage acting as a bottleneck >right now. >Thanks, David Ortiz > >> From: run...@gmail.com >> Subject: Re: Hadoop cluster running in cloudstack >> Date: Thu, 6 Jun 2013 10:23:50 -0400 >> To: us...@cloudstack.apache.org >> >> >> On Jun 6, 2013, at 4:05 AM, Shanker Balan <shanker.ba...@shapeblue.com> >>wrote: >> >> > On 05-Jun-2013, at 12:13 AM, David Ortiz <dpor...@outlook.com> wrote: >> > >> >> Hello, >> >> Has anyone tried running a hadoop cluster in a cloudstack >>environment? I have set one up, but I am finding that I am having some >>IO contention between slave nodes on each host since they all share one >>local storage pool. As I understand it, there is not currently a method >>for using multiple local storage pools with VMs through cloudstack. Has >>anyone found a workaround for this by any chance? >> > >> > >> > Hi David, >> > >> > Have you seen Seb's >>http://www.slideshare.net/sebastiengoasguen/cloudstack-and-bigdata >>slides yet? >> >> As a quick disclaimer, the various configurations I highlight in this >>deck are a bit hand wavy and I did not test them. I just made a guess >>about how one might want to use the baremetal functionality in >>cloudstack. The main distinction being between using a "big data" store >>as storage backends of cloudstack and using cloudstack to provision a >>bigdata store on-demand. >> >> -sebastien >> >> > >> > In my experience running Hadoop (100+ nodes) on traditional servers, >>its going to be really hard to scale up Hadoop workloads using local >>storage and HDFS on a cloud. >> > >> > I ran out of IOPS very quickly. There was enough CPU headroom but >>could not add more slots as disk became the bottleneck. Every time there >>was a node/disk failure, rebalancing was a nightmare with a 3x HDFS >>replication factor. >> > >> > If I were to run Hadoop on an IaaS cloud, I would do it very similar >>to Amazon AWS EMR - instances backed by a "Storage As A Service" layer >>(S3) for big data instead of HDFS. >> > >> > The system would work as below: >> > >> > - Create a dedicated big data storage tier using a distributed >>filesystem like Gluster/Ceph/Isilon. Most of the vendors now provide S3 >>compat connectors for Hadoop. >> > >> > http://ceph.com/docs/master/cephfs/hadoop/ >> > http://gluster.org/community/documentation/index.php/Hadoop >> > http://www.emc.com/big-data/scale-out-storage-hadoop.htm >> > >> > - Hadoop instances are spun up on bare metal or on hypervisors. The >>service offerings for "big data" instances could will run on dedicated >>hypervisors (via tags) with high bandwidth network connectivity to the >>storage service. >> > >> > - Hadoop instances use Local storage for run time data. >> > >> > - Hadoop VMs connect to the storage tier via connectors for permanent >>storage >> > >> > Benefits: >> > >> > - Spinning up/down VMs don't cause HDFS rebalancing as there is no >>HDFS anywhere. >> > >> > - Scale out VMs independently of storage. Add more spindles / nodes >>to the storage cluster to scale out IOPS and capacity >> > >> > - Easy upgrade of Hadoop releases without risk to data >> > >> > Regards. >> > @shankerbalan >> > >> > -- >> > Shanker Balan >> > Managing Consultant >> > >> > >> > >> > M: +91 98860 60539 >> > shanker.ba...@shapeblue.com | www.shapeblue.com | Twitter:@shapeblue >> > ShapeBlue India, 22nd floor, Unit 2201A, World Trade Centre, >>Bangalore - 560 055 >> > >> > This email and any attachments to it may be confidential and are >>intended solely for the use of the individual to whom it is addressed. >>Any views or opinions expressed are solely those of the author and do >>not necessarily represent those of Shape Blue Ltd or related companies. >>If you are not the intended recipient of this email, you must neither >>take any action based upon its contents, nor copy or show it to anyone. >>Please contact the sender if you believe you have received this email in >>error. Shape Blue Ltd is a company incorporated in England & Wales. >>ShapeBlue Services India LLP is operated under license from Shape Blue >>Ltd. ShapeBlue is a registered trademark. >> >