Hi Malik, I think that what docs intended to say is that writes to normalized RDBMS may be slow because maintaining indices takes time, and normalization assumes indices. Django websites are usually read-intensive while scrapy projects are usually write-intensive, so this is more a problem for scrapy prjects. If you know what are you doing at DB level you'll be fine. Maybe the docs should be reworded - I agree that they are too "pro-nonrel". Pull requests are welcome!
вторник, 11 марта 2014 г., 2:54:48 UTC+6 пользователь Malik Rumi написал: > > Hello all. I'm new here and this is my first post to this group. > I was studying the Scrapy docs (it's a slow day at work...;-) when I came > across this: > > DjangoItem caveats > > DjangoItem is a rather convenient way to integrate Scrapy projects with > Django models, but bear in mind that Django ORM may not scale well if you > scrape a lot of items (ie. millions) with Scrapy. This is because a > relational backend is often not a good choice for a write intensive > application (such as a web crawler), specially if the database is highly > normalized and with many indices. > > http://doc.scrapy.org/en/latest/topics/djangoitem.html > > Say what? Explain, please! > > Now, I did keep looking, and found > https://groups.google.com/forum/#!searchin/scrapy-users/DjangoItem/scrapy-users/HsDJ-jM7LvM/ESRlGF6QXcIJ > "Using DjangoItem step-by-step guide" and the SO post from which it > comes. Is the max_locks issue that was brought up there the reason for the > caveat? (Note: I can't access github from work - it's blocked - go figure). > > My project is text heavy (government documents) and I need a solid > database to store my results in. And yes, of course I want to scale. A good > database is *always *normalized, so what are we talking about here? If > you are saying don't use an RDBMS for big projects are you just as well > saying don't use django ORM for big projects? Because the way the caveat is > worded, it talks about django per se, not djangoitems. (And no, I am not > inviting a debate about nonrel). > > Or should I just not use djangoitems and follow Chris' advice on SO "I > ended up not using DjangoItem at all which solved all my problems"? > > As much clarity, detail, and yes, caveats as you can enlighten me with > would be GREATLY appreciated. > > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
