It is a React site so the DOM will be changed dynamically. The reason you get 6 is because if you check the source there are only 6 articles with that class. Scrapy only sees the raw html response, what you were seeing is the DOM that was generated by javascript. As a rule of thumb, always double check to make sure you can find the same stuff from chrome dev tools to the raw page source.
I do this all of the time to make sure I don't have to do a json.loads() on what most likely be a raw string inside the html source that will contain the data. >From my experience, tons of these sites are moving into using React so I started to look inside for the data inside the <script> tags. On Wednesday, October 26, 2016 at 9:35:38 PM UTC-7, ignorant wrote: > > Hi there, > > I am a noob and trying to test this on different product grids. I am not > able to get more than a few (6 to 8) items per page. > > For example, > > import scrapy > > > class NordstromSpider(scrapy.Spider): > name = "nordstrom" > start_urls = [ > ' > http://shop.nordstrom.com/c/womens-dresses-new?origin=leftnav&cm_sp=Top%20Navigation-_-New%20Arrivals > ' > ] > > > def parse(self, response): > for dress in response.css('article.npr-product-module'): > yield { > 'src': dress.css('img.product-photo').xpath('@src'). > extract_first(), > 'url': dress.css('a.product-photo-href').xpath('@href'). > extract_first() > } > > > def noparse(self, response): > page = response.url.split("/")[-2] > filename = 'nordstrom-%s.html' % page > with open(filename, 'wb') as f: > f.write(response.body) > self.log('Saved file %s' % filename) > > > > This gave only 6 items. So I tried another site - > > import scrapy > > > class QuotesSpider(scrapy.Spider): > name = "rtr" > start_urls = [ > 'https://www.renttherunway.com/products/dress' > ] > > > def parse(self, response): > for dress in response.css('div.cycle-image-0'): > yield { > 'image-url': dress.xpath('.//img/@src').extract_first(), > } > > > > This only gave 12 items even though the page has a lot more. > I am guessing that I'm missing a setting somewhere. Any pointers are > appreciated. > > Thanks, > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
