Hello,
Background: I need to get product category based on Breadcrumbs. Example
breadcrumb Home *>* Books *>* Bookname I need to get only Books.
HTML code:
<ol class="breadcrumb container">
<li class="first"><a href=
"http://xxxx.com/index.php?route=common/home"><span>Home</span></a></li>
<li><a href="http://xxxx.com/books"><span>Books</span></a></li>
<li class="last"><a href="http://xxxxx.com/books?product_id=193"
class="last"><span>My Vision : Challenges in the Race for Excellence -
Mohammed Bin Rashid Al Maktoum</span></a></li>
</ol>
xpath I use on browser console which returns me correct value "Books":
//ol[@class="breadcrumb container"]/li[not(contains(@class,"first")) and
not(contains(@class,"last"))]/a/span/text()
My Python code:
for cat in sel.xpath('//ol[@class="breadcrumb
container"]/li[not(contains(@class,"first")) and
not(contains(@class,"last"))]/a/span/text()').extract():
categories[catIndex] = cat
catIndex += 1
When I run my Scrapy spider it returns me whole 3 Li elements including
Home (with class first) and book name (with class last)
I tryed to run Scrapy View http://xxx.com to see page how spider sees it
and xpath works correct there.
http://prntscr.com/8a7a4u
But when I run Scrapy Shell and try the xpath code there it returns me
whole 3 Li elements
http://prntscr.com/8a77xe
So anyone has an idea what might be the problem ?
--
You received this message because you are subscribed to the Google Groups
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.