I met Christophe at the Hadoop Conference at Yahoo last week. I really liked him. He asked me to maintain the Google Ubuntu Hadoop image, I sent him the following about my project. Would you read it and offer any comments?
I sent him the following: "Can I tell you more about my Hadoop in education project? My project started when I found out Amazon.com will ( who would have thought ) will let you rent their computers by the hour. I realized this would be an ideal way for small schools ( really ALL schools - even MIT/Berkley has a hard time coming up with a hundred idle computers for a student to use ) to have access to the resources to expose students ( not just Computer Science, but Physics, Astronomy, Biology, etc ) to working in this environment. That is what my independent study project is about. I am producing a student client workstation image and a department server image with everything needed to teach the a course and hookup to Amazon. The course ware I am getting from the University of Washington and documentation from all over. I first emailed the hadoop list and Aaron Kimball responded and offered his courseware and to "highly endorse your Amazon EC2 idea for doing your labs" . I would love any resources you can point me to. The economic model is the students don't need to buy a textbook and use the money to buy computer time from Amazon. I have already gotten an agency of the state of California to fund my computer time as a student, so that is a good precedent. I am getting weather data and an application to process it with snazzy graphics as a student project. I hope to add more as time goes on. I know I can be accused of the buzzword of the moment, but I hope to put on the server image software to provide a linkup to other people using the server to form a community. The more communication, the faster thing will happen. So model is more "seeding" than "doing". This is an example of what I am putting into this. This needs to work "on its own", with common problems pre-solved, without a lot of case by case work at each institution. I talked to Jinesh Varia from Amazon and I may be able to get them to design a custom product for education, accounting and billing that works with my server to make it simple and secure for the students and teacher to use. If a teacher has to deal with money and billing, it will fail. If you require a school's bureaucracy to handle something new, it will be a harder sell. Schools supply student course needs through the school bookstore. Who is the approved vendor to the school bookstore, Amazon? More case by case work.. And then they want to mark it up 50%-100%. No no no. So a student logs into my server and buys time like you would anything else on the Internet. But bookkeeping and usage records are kept for the teacher. Simple. No extra work in this area to offer the course. When I met you, I felt I had met someone who thought exactly like me on how important it is to facilitate moving people along from not just CS, but other areas to take advantage of the potential this technology. I want to make this available to the guy off in a corner somewhere who has a crazy idea that deserves a Nobel Prize to have what it takes to succeed. Elite should be elite based on worth, not restricted by access to elite level resources as much as can be made possible. " I would appreciate your comments and if you like the ideas, any support you could give yourself and any encouragement you could give Christophe to support this would be appreciated, BTW - my personal email is [EMAIL PROTECTED] electricranch - "herds of CPU's". I use the gmail address for lists that may expose me to spam. Bruce On Fri, Nov 16, 2007 at 7:21 PM, Aaron Kimball <[EMAIL PROTECTED]> wrote: > Bruce, > > I helped design and teach an undergrad course based on Hadoop last year. > Along with some folks at Google, we then made the resources available > together to distribute to other universities and the public at large (via > Creative Commons license, actually). > > All the materials are available online here: > http://code.google.com/edu/content/parallel.html > (lecture notes, labs, and even video lectures.) > > It includes suggested lab activities. Good free data sets you can download > include Netflix prize data and a copy of the wikipedia corpus. Of course, > you can set up Nutch and do your own web crawl too. > > We also highly endorse the Amazon EC2 idea for doing your own labs :) > > Best of luck, > - Aaron > > > > > > Edward Bruce Williams wrote: > > Hello > > > > > > I am a student doing an independent study project investigating the > > possibility of teaching large scale computing on a small scale budget. Th > > > > > > My thought is to use available Open Source ( Hadoop) and Creative Commons > > and other materials as the text. A student could then do significant > > computing on Amazon for the cost of what they would usually pay for a > > textbook. I have convinced an agency of the state of California that > paying > > for computer time for a CS student is "like buying a textbook or > calculator > > for a math student", so "so far so good." > > > > > > I am asking if anyone has some largish data sets, preferably on Amazon, we > > could use for class projects to contact me off list. > > > > > > Thanks, > > > > > > Bruce Williams > > > > > > >