Selecting custom elements from html document

Mark Betz Wed, 30 Apr 2014 02:55:34 -0700

I am using a downloader middleware to add a custom element to an html 
document before spidering. The custom element contains contextual 
information about the document that we want to pass down to the spider for 
selector access. After the middleware executes the document looks like this:


<html>
    <body>
    <!-- ... -->
    </body>
    <myelement>
        <!-- other stuff -->
    </myelement>
</html>


Inside the spider's parse method I attempt to access data in this sub-tree 
using selector.xpath, passing an xpath string like this: 
"//myelement/mystuff/text()"

This selector always returns an empty result set. I can set a breakpoint 
right before the call to selector.xpath and dump response.body to a file, 
open it in Chrome, and use the same xpath to access the data, so I know the 
data is there and the xpath to it is valid. Am I missing something in 
scrapy's behavior or configuration that could account for this?

Thanks for any ideas!

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Selecting custom elements from html document

Reply via email to