Re: Where do I define my output format for item dictionaries

Valdir Stumm Junior Mon, 25 Jan 2016 03:41:41 -0800

Looks like the the XPath selectors you are using are returning more than
one item for each page, e.g. site.xpath('.//race/@id'). The extract()
method returns a SelectorList with all the matching elements inside.


Can you paste an excerpt of the XML file that you are parsing?

On Sun, Jan 24, 2016 at 4:02 AM, Sayth Renshaw <[email protected]>
wrote:

>
> Hi all
>
> Currently when i output to csv scrapy runspider myxml.py -o ~/items.csv -t
> csv I get the header items I defined in settings under feed export, however
> i get the values collected as dictionaries dumped as a dictionary.
>
> Where do i define that dict[0] for each element should be its own line?
>
> So at the moment this is my output
>
> id,num,dist
>
> "209165,209166,209167,209168,209169,209170,209171,209172,209173","1,2,3,4,5,6,7,8,9","1000,1000,1400,1200,1200,1600,1600,1000,2000"
>
> I would want it as
>
> id,num,dist
> 209165,1,1000
> 209166,2,1000
> ...
>
> Looking in feedexporters in the docs for info but feeling I should just be
> creating a customer function to tidy it up, is that what I do if yes where.
> Seems like scrapy has thought of most things so expect its done I am just
> not sure what its called.
>
> My current code.
>
> # -*- coding: utf-8 -*-
> import scrapy
> from scrapy.selector import Selector
> from scrapy.http import HtmlResponse
> from scrapy.selector import XmlXPathSelector
> from conv_xml.items import ConvXmlItem
> # http://stackoverflow.com/a/27391649/461887
> import json
>
>
> class MyxmlSpider(scrapy.Spider):
>     name = "myxml"
>
>     start_urls = (
>         ["file:///home/sayth/Downloads/20160123RAND0.xml"]
>     )
>
>     def parse(self, response):
>         sel = Selector(response)
>         sites = sel.xpath('//meeting')
>         items = []
>
>         for site in sites:
>             item = ConvXmlItem()
>             # item['venue'] = site.xpath('.//@venue').extract()
>             item['id'] = site.xpath('.//race/@id').extract()
>             item['num'] = site.xpath('.//race/@number').extract()
>             item['dist'] = site.xpath('.//race/@distance').extract()
>             items.append(item)
>
>         return items
>
>
> Thanks Sayth
>
> --
> You received this message because you are subscribed to the Google Groups
> "scrapy-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/scrapy-users.
> For more options, visit https://groups.google.com/d/optout.
>



-- 
[image: Scrapinghub] <https://scrapinghub.com>

Valdir Stumm Junior
Developer Evangelist, Scrapinghub
[image: Skype] stummjr
[image: Twitter] <https://twitter.com/stummjr> [image: Github]
<https://github.com/stummjr>
[image: Twitter] <https://twitter.com/scrapinghub> [image: LinkedIn]
<https://www.linkedin.com/company/scrapinghub> [image: Github]
<https://github.com/scrapinghub>

*We turn web content into structured data. Lead maintainers of Scrapy
<http://scrapy.org>.*

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Re: Where do I define my output format for item dictionaries

Reply via email to