On 18/09/11 03:37, Michael Segel wrote:
2) Data Loss. You can mitigate this as well. Do I need to go through all of the options and DR/BCP planning? Sure there's always a chance that you have some Luser who does something brain dead. This is true of all databases and systems. (I know I can probably recount some of IBM's Informix and DB2 having data loss issues. But that's a topic for another time. ;-)
That raises one more point. Once your cluster grows it's hard to back it up except to other Hadoop clusters. If you want survive loss-of-site events (power, communications) then you'll need to exchange copies of the high-value data between physically remote clusters. But you may not need to replicate at 3x remotely, because it's only backup data.
-steve