I can't tell without the code snippet but I comment to suggest that you load it as a spider middleware instead, probably stacked as close to the spider as possible. This should *not* solve this problem. Does your selector work at all? What scrapy version do you use? You may accidentally introduce character encoding issues. How do you dump the html back to a string to return response.replace(body=newbody)?
On Wednesday, 30 April 2014 06:15:45 UTC+3, Mark Betz wrote: > > I am using a downloader middleware to add a custom element to an html > document before spidering. The custom element contains contextual > information about the document that we want to pass down to the spider for > selector access. After the middleware executes the document looks like this: > > <html> > <body> > <!-- ... --> > </body> > <myelement> > <!-- other stuff --> > </myelement> > </html> > > > Inside the spider's parse method I attempt to access data in this sub-tree > using selector.xpath, passing an xpath string like this: > "//myelement/mystuff/text()" > > This selector always returns an empty result set. I can set a breakpoint > right before the call to selector.xpath and dump response.body to a file, > open it in Chrome, and use the same xpath to access the data, so I know the > data is there and the xpath to it is valid. Am I missing something in > scrapy's behavior or configuration that could account for this? > > Thanks for any ideas! > > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
