Re: Selecting custom elements from html document

Mark Betz Wed, 30 Apr 2014 07:22:35 -0700

Thanks, Nikolaos. This turned out to be a simple problem with trying to 
select upper case tag names which were getting lowercased by the lxml html 
parser. I just wasn't seeing the forest for the trees last night. 
Appreciate your reply.


On Tuesday, April 29, 2014 11:15:45 PM UTC-4, Mark Betz wrote:
>
> I am using a downloader middleware to add a custom element to an html 
> document before spidering. The custom element contains contextual 
> information about the document that we want to pass down to the spider for 
> selector access. After the middleware executes the document looks like this:
>
> <html>
>     <body>
>     <!-- ... -->
>     </body>
>     <myelement>
>         <!-- other stuff -->
>     </myelement>
> </html>
>
>
> Inside the spider's parse method I attempt to access data in this sub-tree 
> using selector.xpath, passing an xpath string like this: 
> "//myelement/mystuff/text()"
>
> This selector always returns an empty result set. I can set a breakpoint 
> right before the call to selector.xpath and dump response.body to a file, 
> open it in Chrome, and use the same xpath to access the data, so I know the 
> data is there and the xpath to it is valid. Am I missing something in 
> scrapy's behavior or configuration that could account for this?
>
> Thanks for any ideas!
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Re: Selecting custom elements from html document

Reply via email to