Hi, I am currently doing some research on distributed database that can be scaled easily in terms of storage capacity.
The reason is to use it on the brazilian federal project called "portal do aluno" wich will have around 10 million kids accessing it monthly. The idea is to build a portal similar to facebook/orkut with the main objective to spread knowledge amoung kids (6 -13 years old). well, now the problem: Those kids will generate a lot of data which include photos, videos, presentations, school tasks among others. In order to have a 100% available system and also to scale this amount of data (initial estimative is 10 TB at the full use of the portal), a distributed storage engine seems to be the solution. On the avialable solutions, i liked voldemort because it seems not to have a SPOF (single point of failure) when compared to HBase. However HBase seems to integrate with more tools and sub-projects. my question is concerned to the fact of storing such big items (2 MB photo for example) with HBase. I read on on blogs that HBase has a high latency which leads it to be inappropriate to serve dynamic pages. Will the performance of HBase decrease even more if large binary objects are stored on it? Other question i have is related to the fact of modelling the data using key/value pattern. With relational database it is just follow cake recipe and it´s done. Do we have such recipe for key/value? Currently a lot of code was done with relational database postgreSQL using hibernate to mapping the objects. i will appreciate any comments -- "A realidade de cada lugar e de cada época é uma alucinação coletiva." Bloom, Howard