Thanks, Nikolaos. This turned out to be a simple problem with trying to select upper case tag names which were getting lowercased by the lxml html parser. I just wasn't seeing the forest for the trees last night. Appreciate your reply.
On Tuesday, April 29, 2014 11:15:45 PM UTC-4, Mark Betz wrote: > > I am using a downloader middleware to add a custom element to an html > document before spidering. The custom element contains contextual > information about the document that we want to pass down to the spider for > selector access. After the middleware executes the document looks like this: > > <html> > <body> > <!-- ... --> > </body> > <myelement> > <!-- other stuff --> > </myelement> > </html> > > > Inside the spider's parse method I attempt to access data in this sub-tree > using selector.xpath, passing an xpath string like this: > "//myelement/mystuff/text()" > > This selector always returns an empty result set. I can set a breakpoint > right before the call to selector.xpath and dump response.body to a file, > open it in Chrome, and use the same xpath to access the data, so I know the > data is there and the xpath to it is valid. Am I missing something in > scrapy's behavior or configuration that could account for this? > > Thanks for any ideas! > > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
