Hi Danushka, For the EC2 integration testing, I created an AMI which has an application which does gene sequence alignment using Smith-Waterman algorithm. I used some sample gene sequences to execute that application. Apart from that I don't have access to large set of gene sequence data. If I do have access to such data, I don't think I have the necessary rights to share those data.
Thanks, Heshan. On Thu, Apr 4, 2013 at 8:54 AM, Lahiru Gunathilake <[email protected]>wrote: > Hi Danushka, > > I believe Heshan have some dataset because he did the EC2 integration to > Airavata. > > Heshan, Do you have any comments on this topic ? > > Regards > Lahiru > > > On Thu, Apr 4, 2013 at 7:29 AM, Danushka Menikkumbura < > [email protected]> wrote: > > > Hi Devs, > > > > I have been working on adding support for cloud bursting in XBaya (see > > thread "0.7 release plan"). As the work is nearing completion, I am now > > thinking of a usecase to design a performance test to justify the value > > addition of Hadoop integration effort [1]. > > > > I need your assistance at this point for few things. > > > > 1. To come up with a good usecase that is based on some data-intensive > > scientific computing scenario. As I believe a scenario from any > scientific > > paradigm would come in handy rather than going for a standard Hadoop > > scenario (like counting words) on a larger data set. > > > > 2. A data set to test the usecase > > > > 3. A run environment to set it up and do few rounds of testing. > > > > I am really curious to know what you think. > > > > Also, I hope Milinda would also have something to add. > > > > [1] - https://issues.apache.org/jira/browse/AIRAVATA-357 > > > > Thanks, > > Danushka > > > > > > -- > System Analyst Programmer > PTI Lab > Indiana University > -- Regards, Heshan Suriyaarachchi http://heshans.blogspot.com/ http://www.linkedin.com/in/heshan
