On Sun, Feb 13, 2011 at 8:33 AM, Ramdas S <ram...@gmail.com> wrote: > > > Thought I will pick your brains on this. > > We are archiving a lot of information, some message format very similar to > email in structure, through its not an RFC complaint format. Presently we > are storing some basic seachable details in a data base, and the physical > file is in a SAN box, with the location of file also in the database. It's > fine now, but we are expecting the client to generate a few TB of > information over the next 2 years. > > Does this make a good case of using NoSQL. Also I remember someone saying > that NOSQL stuff like MongoDB does a miss a document once in a while. >
Wow, interesting discussion. I missed the crux of it I believe. Anyway here are my 2 cents on this. Document storage, query and retrieval always has 3 aspects - Data consistency, Data availability and Data distribution (partitioning). Traditional RDBMS (aka SQL) Databases focus mainly on the first one, i.e data consistency - hence we got the terms 'ACID' compliant and the like. On top of such structured data which always promise you consistency, they provide a structured query language, aka SQL. In this world, Availability and Partitioning are always add-ons, that painful process the DBM has to perform with DB replication, mirroring, modifying schema for partitioning, Clustered DB etc. The new generation DBs instead chose to focus on the latter two, i.e availability and distribution, while making some assumptions on the consistency part. They are natural evolutions from the data grid or cloud architecture where data is massively striped and scaled on to multiple nodes in multiple data centers thereby lowering your data retrieval latency to the scale of micro seconds. Hence they are a natural fit if you don't mind some inconsistency in data retrieval from say 2 clients across different geographical locations at the same time, but you are more concerned about how quick the data is stored and retrieved. These DBs also choose not to use SQL, hence the "NoSQL" term. The reason is that they don't need to use SQL since the focus is not so much on queries that span and join across multiple tables as in a fast fetch, given a key There are some problems which fit the NoSQL world and some which fit the the SQL one. If you are bank, you won't dare to dream not having data consistency, since data correctness and atomic transactions are so much essential in the financial world. But a twitter or Google can live with some minor inconsistencies, but they need fast response time, so map/reduce and NoSQL DBs is a natural fit there. So an approximate rule of thumb would be, 1. If your data is highly structured and you have complex queries and your clients expect consistent results, stick to RDBMs. 2. If your data is more like a simple key-value store and you are more worried about query/response times rather than the consistency of the data, perhaps a Document storage (no sql) design is the correct one. To me both these worlds are complimentary to each other. I don't believe in the so-called 'sql' vs 'nosql' wars. That is simply a media hype. --Anand _______________________________________________ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers