Incorrect xpath values when spider crawls website

netcrime Sat, 29 Aug 2015 08:50:02 -0700

Hello,

Background: I need to get product category based on Breadcrumbs. Example 
breadcrumb Home *>* Books *>* Bookname I need to get only Books.


HTML code:
<ol class="breadcrumb container">
        <li class="first"><a href=
"http://xxxx.com/index.php?route=common/home";><span>Home</span></a></li>
        <li><a href="http://xxxx.com/books";><span>Books</span></a></li>
        <li class="last"><a href="http://xxxxx.com/books?product_id=193"; 
class="last"><span>My Vision : Challenges in the Race for Excellence - 
Mohammed Bin Rashid Al Maktoum</span></a></li>
    </ol>

xpath I use on browser console which returns me correct value "Books":

//ol[@class="breadcrumb container"]/li[not(contains(@class,"first")) and 
not(contains(@class,"last"))]/a/span/text()

My Python code:

for cat in sel.xpath('//ol[@class="breadcrumb 
container"]/li[not(contains(@class,"first")) and 
not(contains(@class,"last"))]/a/span/text()').extract():
                categories[catIndex] = cat
                catIndex += 1

When I run my Scrapy spider it returns me whole 3 Li elements including 
Home (with class first) and book name (with class last)

I tryed to run Scrapy View http://xxx.com to see page how spider sees it 
and xpath works correct there.

http://prntscr.com/8a7a4u

But when I run Scrapy Shell and try the xpath code there it returns me 
whole 3 Li elements 

http://prntscr.com/8a77xe


So anyone has an idea what might be the problem ?

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Incorrect xpath values when spider crawls website

Reply via email to