I've posted this problem <http://stackoverflow.com/questions/34330372/scrapy-different-spider-for-different-type-item> onto stackoverflow.com, here's the content below.
I think the framework of scrapy <http://scrapy.org/> might be a little inflexible. And I can't find good solution for my issue. Here's the issue I'm facing now. There's a website, let's say, to be, http://example.com/. I want to scrap some information from it. It has many items which are urls in form of http://example.com/item/([0-9]+), for now I *have*the list of the valid ([0-9]+) which has about *3 million* index ids, it might seems to be a simple mission to complete the whole webpage scrapping work. *But*, the structure of this mission is like this: - there are many data of the item on the page of /item/. I want these information, this is simple to achieve. - there are links refer to the entity related to the item, for example item owner with link path /owner/, or the collections the item belongs with link path /collection/ and so on. I want all the *unique* information of these entities, which is hard to achieve. They shouldn't be the nested item of item or scrapped by single spider because of the reason below: - *single* owner have [1-n] items. - *single* item have [1-n] owners. - same as collection with item. - there are links refer to other entity related to the item, for example, comment with link path /comment/ or user who like it with link path /user/. Obviously, it's wise to split commentor user information away from item and use *key or index* to refer to entity. This is hard to achieve by single spider. So, I prefer to start a spider to handle the list of http://example.com/item/([0-9]+), and use other type of spiders to handle with item owner, collection, comment, and userrespectively. *But*, the problem is I don't have the list of item owner, collection, comment, and user. I could go through all of these entities only by iterate the webpage of http://example.com/item/([0-9]+). I have googled a lot but found no solution to fit my issue. Please feel free to give your opinion out. -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
