Hadoop isn't going to like losing its datanodes when people shutdown their computers. More importantly, when the datanodes are running, your users will be impacted by data replication. Unlike Seti, Hadoop doesn't know when the user's screensaver is running so it will start doing things when it feels like it.
Can someone else comment on whether HOD (hadoop-on-demand) would fit this scenario? Bill -----Original Message----- From: Maciej Trebacz [mailto:maciej.treb...@gmail.com] Sent: Wednesday, December 02, 2009 4:50 PM To: common-user@hadoop.apache.org Subject: Using Hadoop in non-typical large scale user-driven environment First of all, I'd like to say hi to all people on the list. I ran across Hadoop and Cloudera projects recently, and I was immediately intrigued with it, because I'm in the middle of writing a project that will use large scale distributed computing for a degree in my school. It seems like a perfect tool for me to use, but I have some questions to get sure this is the right tool for my needs. Project I'm making assumes that there is one master node which is distributing data and there are several (in theory, hundreds, thousands or more) slave nodes. To this point, this is exactly what Hadoop is for. But now is the tricky part. I want the slaves to be computers that are used by people everyday. Think s...@home. So user installs Hadoop client and ideally - forgets about it, and his computer helps to do the computations. Also, user will not want to spend much of his hard drive for the computation data. The problem with this model, as far as I understand, is that users will often shut down their computers (for whatever reason), once a day or even more. Will that be a big problem for Hadoop server to handle? I mean, I am afraid that most of processing power and bandwidth will be used for controlling the traffic in the network and it will not be effective. I will appreciate any opinion in this case. -- Best regards, Maciej "mav" Trębacz from Poland.