Hi, In my settings.py file I was using:
USER_AGENT = 'Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.93 Safari/537.36' It apears that Scrapy just didn't saw <li> element class atributes.. My workaround was: //ol[@class="breadcrumb container"]/li[position() > 1 and position() < last( On Saturday, August 29, 2015 at 10:23:42 PM UTC+3, Ashish Meena wrote: > > Hi, > > It is possible different page is fetched for different browsers. What > value are you using for property USER_AGENT for scrapy? Could you try to > put same value for this property as used by web browser? > > Regards, > Ashish > > On Sat, Aug 29, 2015 at 4:49 PM, netcrime <[email protected] <javascript:> > > wrote: > >> Hello, >> >> Background: I need to get product category based on Breadcrumbs. Example >> breadcrumb Home *>* Books *>* Bookname I need to get only Books. >> >> HTML code: >> <ol class="breadcrumb container"> >> <li class="first"><a href=" >> http://xxxx.com/index.php?route=common/home"><span>Home</span></a></li> >> <li><a href="http://xxxx.com/books"><span>Books</span></a></li> >> <li class="last"><a href="http://xxxxx.com/books?product_id=193" >> class="last"><span>My Vision : Challenges in the Race for Excellence - >> Mohammed Bin Rashid Al Maktoum</span></a></li> >> </ol> >> >> xpath I use on browser console which returns me correct value "Books": >> >> //ol[@class="breadcrumb container"]/li[not(contains(@class,"first")) and >> not(contains(@class,"last"))]/a/span/text() >> >> My Python code: >> >> for cat in sel.xpath('//ol[@class="breadcrumb >> container"]/li[not(contains(@class,"first")) and >> not(contains(@class,"last"))]/a/span/text()').extract(): >> categories[catIndex] = cat >> catIndex += 1 >> >> When I run my Scrapy spider it returns me whole 3 Li elements including >> Home (with class first) and book name (with class last) >> >> I tryed to run Scrapy View http://xxx.com to see page how spider sees it >> and xpath works correct there. >> >> http://prntscr.com/8a7a4u >> >> But when I run Scrapy Shell and try the xpath code there it returns me >> whole 3 Li elements >> >> http://prntscr.com/8a77xe >> >> >> So anyone has an idea what might be the problem ? >> >> -- >> You received this message because you are subscribed to the Google Groups >> "scrapy-users" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To post to this group, send email to [email protected] >> <javascript:>. >> Visit this group at http://groups.google.com/group/scrapy-users. >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
