Re: Selecting custom elements from html document

Nikolaos-Digenis Karagiannis Wed, 30 Apr 2014 05:41:25 -0700

I can't tell without the code snippet but I comment to suggest that you 
load it as a spider middleware instead, probably stacked as close to the 
spider as possible. This should *not* solve this problem.
Does your selector work at all? What scrapy version do you use? You may 
accidentally introduce character encoding issues. How do you dump the html 
back to a string to return response.replace(body=newbody)?


On Wednesday, 30 April 2014 06:15:45 UTC+3, Mark Betz wrote:
>
> I am using a downloader middleware to add a custom element to an html 
> document before spidering. The custom element contains contextual 
> information about the document that we want to pass down to the spider for 
> selector access. After the middleware executes the document looks like this:
>
> <html>
>     <body>
>     <!-- ... -->
>     </body>
>     <myelement>
>         <!-- other stuff -->
>     </myelement>
> </html>
>
>
> Inside the spider's parse method I attempt to access data in this sub-tree 
> using selector.xpath, passing an xpath string like this: 
> "//myelement/mystuff/text()"
>
> This selector always returns an empty result set. I can set a breakpoint 
> right before the call to selector.xpath and dump response.body to a file, 
> open it in Chrome, and use the same xpath to access the data, so I know the 
> data is there and the xpath to it is valid. Am I missing something in 
> scrapy's behavior or configuration that could account for this?
>
> Thanks for any ideas!
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Re: Selecting custom elements from html document

Reply via email to