It is a React site so the DOM will be changed dynamically. The reason you 
get 6 is because if you check the source there are only 6 articles with 
that class. Scrapy only sees the raw html response, what you were seeing is 
the DOM that was generated by javascript. As a rule of thumb, always double 
check to make sure you can find the same stuff from chrome dev tools to the 
raw page source.

I do this all of the time to make sure I don't have to do a json.loads() on 
what most likely be a raw string inside the html source that will contain 
the data. 

>From my experience, tons of these sites are moving into using React so I 
started to look inside for the data inside the <script> tags.

On Wednesday, October 26, 2016 at 9:35:38 PM UTC-7, ignorant wrote:
>
> Hi there,
>
> I am a noob and trying to test this on different product grids. I am not 
> able to get more than a few (6 to 8) items per page.
>
> For example, 
>
> import scrapy
>
>
> class NordstromSpider(scrapy.Spider):
>     name = "nordstrom"
>     start_urls = [
>         '
> http://shop.nordstrom.com/c/womens-dresses-new?origin=leftnav&cm_sp=Top%20Navigation-_-New%20Arrivals
> '
>     ]
>
>
>     def parse(self, response):
>         for dress in response.css('article.npr-product-module'):
>             yield {
>                 'src': dress.css('img.product-photo').xpath('@src').
> extract_first(),
>                 'url': dress.css('a.product-photo-href').xpath('@href').
> extract_first()
>             }
>
>
>     def noparse(self, response):
>         page = response.url.split("/")[-2]
>         filename = 'nordstrom-%s.html' % page
>         with open(filename, 'wb') as f:
>             f.write(response.body)
>         self.log('Saved file %s' % filename)
>
>
>
> This gave only 6 items. So I tried another site -
>
> import scrapy
>
>
> class QuotesSpider(scrapy.Spider):
>     name = "rtr"
>     start_urls = [
>         'https://www.renttherunway.com/products/dress'
>     ]
>
>
>     def parse(self, response):
>         for dress in response.css('div.cycle-image-0'):
>             yield {
>                 'image-url': dress.xpath('.//img/@src').extract_first(),
>             }
>
>
>
> This only gave 12 items even though the page has a lot more.
> I am guessing that I'm missing a setting somewhere. Any pointers are 
> appreciated.
>
> Thanks,
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to