Hi
> Looks like the the XPath selectors you are using are returning more than
one item for each page, e.g. site.xpath('.//race/@id').
yes it does, mostly for that selector 8 occurrences though it can vary it
wouldn't often, other selectors could have upwards to 24 items in them.
An exert may be messy i will try and edit and small copy the originals are
posted on a public website, this is a link (don't click unless you accept
to download as its a direct
link) http://old.racingnsw.com.au/Site/_content/racebooks/20160130RHIL0.xml
This is an id by itself and yes they love attributes, in the example above
for my output though I am trying to filter so that for each .//race/@id I
extract I can output the desired attributes so that I am designing a csv or
json file which has all the ids and the descriptors from the attributes.
<race id="209165" number="1" nomnumber="2" division="0" name="SCHWEPPES
QUALITY" mediumname="WILKES" shortname="WILKES" stage="Acceptances"
distance="1000" minweight="55" raisedweight="1" class="~ " age="3
" grade="4" weightcondition="QLT " trophy="0" owner="0"
trainer="0" jockey="0" strapper="0" totalprize="85000" first="48750"
second="16750" third="8350" fourth="4150" fifth="2000"
time="2016-01-23T12:40:00" bonustype="BOB7 " nomsfee="0" acceptfee="0"
trackcondition=" " timingmethod=" " fastesttime="
" sectionaltime=" " formavailable="0" racebookprize="Of $85000.
First $48750, second $16750, third $8350, fourth $4150, fifth $2000, sixth
$1000, seventh $1000, eighth $1000, ninth $1000, tenth $1000">
Thanks
Sayth
> Looks like the the XPath selectors you are using are returning more than
> one item for each page, e.g. site.xpath('.//race/@id'). The extract()
> method returns a SelectorList with all the matching elements inside.
>
> Can you paste an excerpt of the XML file that you are parsing?
>
> On Sun, Jan 24, 2016 at 4:02 AM, Sayth Renshaw <[email protected]
> <javascript:>> wrote:
>
>>
>> Hi all
>>
>> Currently when i output to csv scrapy runspider myxml.py -o ~/items.csv
>> -t csv I get the header items I defined in settings under feed export,
>> however i get the values collected as dictionaries dumped as a dictionary.
>>
>> Where do i define that dict[0] for each element should be its own line?
>>
>> So at the moment this is my output
>>
>> id,num,dist
>>
>> "209165,209166,209167,209168,209169,209170,209171,209172,209173","1,2,3,4,5,6,7,8,9","1000,1000,1400,1200,1200,1600,1600,1000,2000"
>>
>> I would want it as
>>
>> id,num,dist
>> 209165,1,1000
>> 209166,2,1000
>> ...
>>
>> Looking in feedexporters in the docs for info but feeling I should just
>> be creating a customer function to tidy it up, is that what I do if yes
>> where. Seems like scrapy has thought of most things so expect its done I am
>> just not sure what its called.
>>
>> My current code.
>>
>> # -*- coding: utf-8 -*-
>> import scrapy
>> from scrapy.selector import Selector
>> from scrapy.http import HtmlResponse
>> from scrapy.selector import XmlXPathSelector
>> from conv_xml.items import ConvXmlItem
>> # http://stackoverflow.com/a/27391649/461887
>> import json
>>
>>
>> class MyxmlSpider(scrapy.Spider):
>> name = "myxml"
>>
>> start_urls = (
>> ["file:///home/sayth/Downloads/20160123RAND0.xml"]
>> )
>>
>> def parse(self, response):
>> sel = Selector(response)
>> sites = sel.xpath('//meeting')
>> items = []
>>
>> for site in sites:
>> item = ConvXmlItem()
>> # item['venue'] = site.xpath('.//@venue').extract()
>> item['id'] = site.xpath('.//race/@id').extract()
>> item['num'] = site.xpath('.//race/@number').extract()
>> item['dist'] = site.xpath('.//race/@distance').extract()
>> items.append(item)
>>
>> return items
>>
>>
>> Thanks Sayth
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "scrapy-users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected] <javascript:>.
>> To post to this group, send email to [email protected]
>> <javascript:>.
>> Visit this group at https://groups.google.com/group/scrapy-users.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
>
> --
> [image: Scrapinghub] <https://scrapinghub.com>
>
> Valdir Stumm Junior
> Developer Evangelist, Scrapinghub
> [image: Skype] stummjr
> [image: Twitter] <https://twitter.com/stummjr> [image: Github]
> <https://github.com/stummjr>
> [image: Twitter] <https://twitter.com/scrapinghub> [image: LinkedIn]
> <https://www.linkedin.com/company/scrapinghub> [image: Github]
> <https://github.com/scrapinghub>
>
> *We turn web content into structured data. Lead maintainers of Scrapy
> <http://scrapy.org>.*
>
--
You received this message because you are subscribed to the Google Groups
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.