Hi Malik,

I think that what docs intended to say is that writes to normalized RDBMS 
may be slow because maintaining indices takes time, and normalization 
assumes indices. Django websites are usually read-intensive while scrapy 
projects are usually write-intensive, so this is more a problem for scrapy 
prjects. If you know what are you doing at DB level you'll be fine. Maybe 
the docs should be reworded - I agree that they are too "pro-nonrel". Pull 
requests are welcome!

вторник, 11 марта 2014 г., 2:54:48 UTC+6 пользователь Malik Rumi написал:
>
> Hello all. I'm new here and this is my first post to this group.
> I was studying the Scrapy docs (it's a slow day at work...;-) when I came 
> across this:
>
> DjangoItem caveats 
>
> DjangoItem is a rather convenient way to integrate Scrapy projects with 
> Django models, but bear in mind that Django ORM may not scale well if you 
> scrape a lot of items (ie. millions) with Scrapy. This is because a 
> relational backend is often not a good choice for a write intensive 
> application (such as a web crawler), specially if the database is highly 
> normalized and with many indices.
>
> http://doc.scrapy.org/en/latest/topics/djangoitem.html 
>
> Say what? Explain, please!
>
> Now, I did keep looking, and found 
> https://groups.google.com/forum/#!searchin/scrapy-users/DjangoItem/scrapy-users/HsDJ-jM7LvM/ESRlGF6QXcIJ
>  "Using DjangoItem step-by-step guide" and the SO post from which it 
> comes. Is the max_locks issue that was brought up there the reason for the 
> caveat? (Note: I can't access github from work - it's blocked - go figure).
>
> My project is text heavy (government documents) and I need a solid 
> database to store my results in. And yes, of course I want to scale. A good 
> database is *always *normalized, so what are we talking about here? If 
> you are saying don't use an RDBMS for big projects are you just as well 
> saying don't use django ORM for big projects? Because the way the caveat is 
> worded, it talks about django per se, not djangoitems. (And no, I am not 
> inviting a debate about nonrel).
>
> Or should I just not use djangoitems and follow Chris' advice on SO "I 
> ended up not using DjangoItem at all which solved all my problems"?
>
> As much clarity, detail, and yes, caveats as you can enlighten me with 
> would be GREATLY appreciated. 
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to