I've posted this problem 
<http://stackoverflow.com/questions/34330372/scrapy-different-spider-for-different-type-item>
 
onto stackoverflow.com, here's the content below.

I think the framework of scrapy <http://scrapy.org/> might be a little 
inflexible. And I can't find good solution for my issue.

Here's the issue I'm facing now.

There's a website, let's say, to be, http://example.com/. I want to scrap 
some information from it.

It has many items which are urls in form of http://example.com/item/([0-9]+), 
for now I *have*the list of the valid ([0-9]+) which has about *3 million* 
index 
ids, it might seems to be a simple mission to complete the whole webpage 
scrapping work.

*But*, the structure of this mission is like this:

   - there are many data of the item on the page of /item/. I want these 
   information, this is simple to achieve.
   - there are links refer to the entity related to the item, for example item 
   owner with link path /owner/, or the collections the item belongs with 
   link path /collection/ and so on. I want all the *unique* information of 
   these entities, which is hard to achieve. They shouldn't be the nested item 
   of item or scrapped by single spider because of the reason below:
      - *single* owner have [1-n] items.
      - *single* item have [1-n] owners.
      - same as collection with item.
   - there are links refer to other entity related to the item, for 
   example, comment with link path /comment/ or user who like it with link 
   path /user/. Obviously, it's wise to split commentor user information 
   away from item and use *key or index* to refer to entity. This is hard 
   to achieve by single spider.

So, I prefer to start a spider to handle the list of 
http://example.com/item/([0-9]+), and use other type of spiders to handle 
with item owner, collection, comment, and userrespectively.

*But*, the problem is I don't have the list of item owner, collection, 
comment, and user. I could go through all of these entities only by iterate 
the webpage of http://example.com/item/([0-9]+).

I have googled a lot but found no solution to fit my issue. Please feel 
free to give your opinion out.

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to