I'm using scrapy to get information of this 
website<http://guia.bcn.cat/index.php?pg=search&q=*:*>

The code that I want scrape has the following structure:

    <div id="llista-resultats">
     <div>
      <h3>
       <a href="URL"> Title </a>
      <div class="dades">
       <dl>
        <dt> </dt>
        <dd> </dd>
        ...
      </div>
     <div>
      And repetar again  


I have done tests and I know how to get the information, but the problem 
that I have with the following code is that I get all the titles, then all 
the URLs, etc and that I want is select the first title with the first URL.

 
  class BcnSpider(CrawlSpider):
        name = 'bcn'
        allowed_domains = ['guia.bcn.cat']
        start_urls = ['http://guia.bcn.cat/index.php?pg=search&q=*:*']
    
        def parse(self, response):
    sel = Selector(response)
    sites = sel.xpath("//div[@id='llista-resultats']")
    items = []
    for site in sites:
    item = BcnItem()
    item['title'] = site.xpath("//div[@id='llista-resultats']//h3/a/text()"
).extract()
    item['url'] = site.xpath("//div[@id='llista-resultats']//h3/a/@href").
extract()
    item['when'] = site.xpath(
"//div[@id='llista-resultats']//div[@class='dades']/dl/dd/text()").extract()
    items.append(item)
    return items


I think that the error is because I'm using "*//*" on each item, but i 
didn't achived get information that is descendant of "*sites = 
sel.xpath("//div[@id='llista-resultats']")*".

Here my post on 
StackOverflow<http://stackoverflow.com/questions/20908790/i-cant-scrape-div-parameters-of-the-website-scrapy>

Thanks for all

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to