Martin,

On Wed, Feb 9, 2011 at 4:39 PM, Martin Holst Swende <mar...@swende.se> wrote:
> On 02/09/2011 02:31 AM, Steve Pinkham wrote:
>
> On 02/08/2011 08:08 PM, Andres Riancho wrote:
>
> Steve,
>
> noSQL servers are usually fast because they are in-memory systems.
> sqlite can be used in that mode also if you like.
>
> mongodb is not an in-memory db!
>
> In practice, it is.  It stores all indexes in memory and uses memory
> mapped files. It will automatically consume all available memory (which
> is a good thing or bad thing depending on what else you want to use the
> server for).
>
> http://www.mongodb.org/display/DOCS/Indexing+Advice+and+FAQ#IndexingAdviceandFAQ-MakesureyourindexescanfitinRAM.
>
> http://www.mongodb.org/display/DOCS/Caching
>
> Hi all,
>
> I have to say I disagree that MongoDB is called a memory-db. There are such
> things as memory-databases, e.g. H2, and MongoDB is not one of them. These
> databases keep *all* data in memory, which is another matter than using the
> memory for indices (which are orders of magnitude smaller than the data) and
> caching (which I would guess that all daemon-mode databases tries to do as
> best as possible).
>
> I also disagree that they are "usually fast because they are in-memory
> systems". They are usually fast because they basically let the 'C' in
> Brewers CAP-theorem suffer, that is to say that they do not enforce
> consistency across all nodes. This allows for better partition tolerance and
> availability. They often employ  "eventual consistency". An example of one
> eventual-consistency system is the internet DNS system. Individual nodes
> (dns servers) may give stale information about a hostname, but eventually
> updates will reach all nodes and the system will be consistent again. Why is
> this important? Like the DNS-example, such a system can be built without any
> locks on readers or writers. Since it is ok for a reader to get 'stale'
> information, a writer can create a new version of a data-post first, then
> update the pointer. Neither reader nor writer have to wait.
>
> Some databases, such as CouchDB have gone further, using MVCC with a
> built-in git vcs to handle simultaneous modification of data on several
> nodes. It also has an append-only filesystem-implementation to further
> eliminate locking at the filesystem-level (and ensure that file corruption
> cannot occur)
>
> Having said all this, I concur that using e.g. MongoDB for w3af is probably
> not necessary, it sounds strange that sqlite would be unable to handle the
> somewhat modest amounts of data we're talking about. Also, I can see that
> concerns whether to really switch to a daemon-mode database arise. That
> totally depends on what is the purpose of w3af - if the purpose is to be a
> good scanner which is easy to use and install, daemon-db is a bad choice. If
> the purpose is to be the best - regardless of ease of installation and use -
> then I wouldn't blink before switching to a daemon database if that gives
> any advantage.
>
> Two more comments I disagree with:
>
> "It's useful in distributed, massively parallel systems, but offers no real
> benefit for single user databases."
> and
> "noSQL is just the new term for key-value stores."
>
> It is true that it is useful for distributed, massively parallel system, but
> there are also advantages to using it for data which fits the dynamic
> (schemaless) model. Having no schema enforced by the database does not mean
> that the database is just a disk-based hash table with blobs for values. I
> would instead say that noSQL is more like a new generation of the object
> databases, but now with generic API's (json/bson/http) and wide language
> support. Certain kinds of data fit very well into these models.
>
> I have written a proxy which saves http traffic into a MongoDB
> (http://martin.swende.se/hg/#hatkit_proxy-t1/) and a framework to analyse
> traffic from this database (http://martin.swende.se/hg#hatkit_fiddler-t1).
> Http traffic looks very non-uniform. Some requests are basically "GET /
> HTTP/1.1" while others contain forms or json and lots and lots of headers.
> Using MongoDB, it is possible to represent the data more at an object-level,
> e.g.
> { request:
>     { method: "GET",
>     headers:{ Content-Length: 1233, Host : "foobar.com", Foo: "bar"}
>     parameters: {gaz: "onk"}
>                     },
>     response : {...}
> }
>
> MongoDB has very powerful querying-facilities
> (http://www.mongodb.org/display/DOCS/Advanced+Queries). Since the object is
> stored with this structure in the database, it is possible to reach into
> objects
> (http://www.mongodb.org/display/DOCS/Dot+Notation+(Reaching+into+Objects)),
> and perform e.g these kind of queries:
>
> "give me response.body where request.parameters.filename exists", or "give
> me request.body.parameters where request.body.parameters.__viewstate does
> not exist"
>
> Also, MongoDB has very powerful aggregation mechanisms
> (http://www.mongodb.org/display/DOCS/Aggregation), where queries like the
> following can be used:
> "Organized by request.headers.host give me all unique parameter names.", or,
> "organized by request.url.path, give me all unique response header keys". To
> generate these, you create javascript 'reduce' functions which are executed
> inside the database.
>
> Another reason, beside being very dynamic, why http traffic is not
> necessarily particularly suited for SQL, is that it is pretty much
> non-relational. Relational databases are good for relational data, e.g.
> where you have Employees, which have Eoles, and are sorted in differnet
> Offices etc etc - where all the data is heavily related to other data. Http
> traffic is basically request and response. No relations to much else.
>
> Oh, and one more thing about incides. When I started with the
> hatkit_fiddler, I decided to wait with adding indices to see where they were
> needed. So far, I haven't felt the need to add any indices at all, it's fast
> enough anyway...
>
> So, I believe that some really cool things could come from a switch to
> MongoDB. But I am not convinced that performance should be a driving reason
> for such a switch.

    I think this is one of the best posts that this mailing list had.
Thanks for the very clear explanation. I'll read it carefully again
when considering the change to a noSQL DB.

> Regards,
> /Martin Holst Swende
>



-- 
Andrés Riancho
Director of Web Security at Rapid7 LLC
Founder at Bonsai Information Security
Project Leader at w3af

------------------------------------------------------------------------------
The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
Pinpoint memory and threading errors before they happen.
Find and fix more than 250 security defects in the development cycle.
Locate bottlenecks in serial and parallel code that limit performance.
http://p.sf.net/sfu/intel-dev2devfeb
_______________________________________________
W3af-develop mailing list
W3af-develop@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/w3af-develop

Reply via email to