I am using a downloader middleware to add a custom element to an html
document before spidering. The custom element contains contextual
information about the document that we want to pass down to the spider for
selector access. After the middleware executes the document looks like this:
<html>
<body>
<!-- ... -->
</body>
<myelement>
<!-- other stuff -->
</myelement>
</html>
Inside the spider's parse method I attempt to access data in this sub-tree
using selector.xpath, passing an xpath string like this:
"//myelement/mystuff/text()"
This selector always returns an empty result set. I can set a breakpoint
right before the call to selector.xpath and dump response.body to a file,
open it in Chrome, and use the same xpath to access the data, so I know the
data is there and the xpath to it is valid. Am I missing something in
scrapy's behavior or configuration that could account for this?
Thanks for any ideas!
--
You received this message because you are subscribed to the Google Groups
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.