I'm using scrapy to get information of this
website<http://guia.bcn.cat/index.php?pg=search&q=*:*>
The code that I want scrape has the following structure:
<div id="llista-resultats">
<div>
<h3>
<a href="URL"> Title </a>
<div class="dades">
<dl>
<dt> </dt>
<dd> </dd>
...
</div>
<div>
And repetar again
I have done tests and I know how to get the information, but the problem
that I have with the following code is that I get all the titles, then all
the URLs, etc and that I want is select the first title with the first URL.
class BcnSpider(CrawlSpider):
name = 'bcn'
allowed_domains = ['guia.bcn.cat']
start_urls = ['http://guia.bcn.cat/index.php?pg=search&q=*:*']
def parse(self, response):
sel = Selector(response)
sites = sel.xpath("//div[@id='llista-resultats']")
items = []
for site in sites:
item = BcnItem()
item['title'] = site.xpath("//div[@id='llista-resultats']//h3/a/text()"
).extract()
item['url'] = site.xpath("//div[@id='llista-resultats']//h3/a/@href").
extract()
item['when'] = site.xpath(
"//div[@id='llista-resultats']//div[@class='dades']/dl/dd/text()").extract()
items.append(item)
return items
I think that the error is because I'm using "*//*" on each item, but i
didn't achived get information that is descendant of "*sites =
sel.xpath("//div[@id='llista-resultats']")*".
Here my post on
StackOverflow<http://stackoverflow.com/questions/20908790/i-cant-scrape-div-parameters-of-the-website-scrapy>
Thanks for all
--
You received this message because you are subscribed to the Google Groups
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/groups/opt_out.