Re: Where do I define my output format for item dictionaries

Sayth Renshaw Mon, 25 Jan 2016 04:12:09 -0800

Hi

> Looks like the the XPath selectors you are using are returning more than 
one item for each page, e.g. site.xpath('.//race/@id').


yes it does, mostly for that selector 8 occurrences though it can vary it 
wouldn't often, other selectors could have upwards to 24 items in them. 

An exert may be messy i will try and edit and small copy the originals are 
posted on a public website, this is a link (don't click unless you accept 
to download as its a direct 
link) http://old.racingnsw.com.au/Site/_content/racebooks/20160130RHIL0.xml

This is an id by itself and yes they love attributes, in the example above 
for my output though I am trying to filter so that for each .//race/@id I 
extract I can output the desired attributes so that I am designing a csv or 
json file which has all the ids and the descriptors from the attributes.

<race id="209165" number="1" nomnumber="2" division="0" name="SCHWEPPES 
QUALITY" mediumname="WILKES" shortname="WILKES" stage="Acceptances" 
distance="1000" minweight="55" raisedweight="1" class="~         " age="3   
      " grade="4" weightcondition="QLT       " trophy="0" owner="0" 
trainer="0" jockey="0" strapper="0" totalprize="85000" first="48750" 
second="16750" third="8350" fourth="4150" fifth="2000" 
time="2016-01-23T12:40:00" bonustype="BOB7      " nomsfee="0" acceptfee="0" 
trackcondition="          " timingmethod="          " fastesttime="         
 " sectionaltime="          " formavailable="0" racebookprize="Of $85000. 
First $48750, second $16750, third $8350, fourth $4150, fifth $2000, sixth 
$1000, seventh $1000, eighth $1000, ninth $1000, tenth $1000">

Thanks
Sayth

> Looks like the the XPath selectors you are using are returning more than 
> one item for each page, e.g. site.xpath('.//race/@id'). The extract() 
> method returns a SelectorList with all the matching elements inside.
>
> Can you paste an excerpt of the XML file that you are parsing?
>
> On Sun, Jan 24, 2016 at 4:02 AM, Sayth Renshaw <[email protected] 
> <javascript:>> wrote:
>
>>
>> Hi all
>>
>> Currently when i output to csv scrapy runspider myxml.py -o ~/items.csv 
>> -t csv I get the header items I defined in settings under feed export, 
>> however i get the values collected as dictionaries dumped as a dictionary.
>>
>> Where do i define that dict[0] for each element should be its own line?
>>
>> So at the moment this is my output
>>
>> id,num,dist
>>
>> "209165,209166,209167,209168,209169,209170,209171,209172,209173","1,2,3,4,5,6,7,8,9","1000,1000,1400,1200,1200,1600,1600,1000,2000"
>>
>> I would want it as
>>
>> id,num,dist
>> 209165,1,1000
>> 209166,2,1000
>> ...
>>
>> Looking in feedexporters in the docs for info but feeling I should just 
>> be creating a customer function to tidy it up, is that what I do if yes 
>> where. Seems like scrapy has thought of most things so expect its done I am 
>> just not sure what its called.
>>
>> My current code.
>>
>> # -*- coding: utf-8 -*-
>> import scrapy
>> from scrapy.selector import Selector
>> from scrapy.http import HtmlResponse
>> from scrapy.selector import XmlXPathSelector
>> from conv_xml.items import ConvXmlItem
>> # http://stackoverflow.com/a/27391649/461887
>> import json
>>
>>
>> class MyxmlSpider(scrapy.Spider):
>>     name = "myxml"
>>
>>     start_urls = (
>>         ["file:///home/sayth/Downloads/20160123RAND0.xml"]
>>     )
>>
>>     def parse(self, response):
>>         sel = Selector(response)
>>         sites = sel.xpath('//meeting')
>>         items = []
>>
>>         for site in sites:
>>             item = ConvXmlItem()
>>             # item['venue'] = site.xpath('.//@venue').extract()
>>             item['id'] = site.xpath('.//race/@id').extract()
>>             item['num'] = site.xpath('.//race/@number').extract()
>>             item['dist'] = site.xpath('.//race/@distance').extract()
>>             items.append(item)
>>
>>         return items
>>
>>
>> Thanks Sayth
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "scrapy-users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To post to this group, send email to [email protected] 
>> <javascript:>.
>> Visit this group at https://groups.google.com/group/scrapy-users.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
>
> -- 
> [image: Scrapinghub] <https://scrapinghub.com> 
>
> Valdir Stumm Junior 
> Developer Evangelist, Scrapinghub 
> [image: Skype] stummjr
> [image: Twitter] <https://twitter.com/stummjr> [image: Github] 
> <https://github.com/stummjr>
> [image: Twitter] <https://twitter.com/scrapinghub> [image: LinkedIn] 
> <https://www.linkedin.com/company/scrapinghub> [image: Github] 
> <https://github.com/scrapinghub>
>
> *We turn web content into structured data. Lead maintainers of Scrapy 
> <http://scrapy.org>.*
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Re: Where do I define my output format for item dictionaries

Reply via email to