Re: [R] Extracting xml data to data frames

2015-04-17 Thread gavinr
Hi I will re-post using dput to recreate the xml file.  I will need some time
to find a small enough file that demonstrates the problem.

Gavin R.



--
View this message in context: 
http://r.789695.n4.nabble.com/Extracting-xml-data-to-data-frames-tp4705964p4705981.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Extracting xml data to data frames

2015-04-16 Thread John Kane
No attachment : R-help is rather fussy about the files it will accept. You are 
probably okay with .txt .pdf, or png but even .csv is likely to get stripped.  

The best way to supply data is by using dput()  .  Type ?dput for information 
or have a look at  http://adv-r.had.co.nz/Reproducibility.html for some hints.  


John Kane
Kingston ON Canada


 -Original Message-
 From: g.ru...@bham.ac.uk
 Sent: Thu, 16 Apr 2015 17:57:44 +
 To: r-help@r-project.org
 Subject: [R] Extracting xml data to data frames
 
 Hi Rgonauts,
 
 I am trying to parse some xml files of transport data using the
 TransExchange format (in this case bus routing information) and obtain
 some data.frames for onward processing for a GIS related task.  Ideally I
 need them in .csv files.
 
 Each file (an example is attached) contains up to 8 tables of information
 about transport operators and routing information.  I have uploaded an
 example that contains all 8.  In fact I have some hundreds of similar
 files that will need processing. So when I've solved this I will need to
 be able to loop through a bunch of them.
 
 I'm new to handling xml data and to the xml package so I don't really
 know what I'm doing, this is my first stab at using the xml package.
 So far the workflow goes something like this.
 
 #get the file
 doc=xmlTreeParse(cen_18-23-D-y11-2.xml)
 top=xmlRoot(doc)
 
 #look at the names
 top=xmlRoot(doc)
 
 #pick one of them to use, in this case the forth one, 'routes', a table
 of information about this particular bus route. using some code from
 another forum post, I can get a data.frame with the info i need in it.
 OK I need to do some reshaping but I can handle that later
 
 fr4-(top[[4]])
 fr4
 xmlSApply(fr4,function(x) xmlSApply(x,xmlValue))
 df-as.data.frame(xmlSApply(fr4,function(x) xmlSApply(x,xmlValue)))
 df
 
 #this works but when I try it with another table, the fifth one say, that
 captures information about the parts of the journey between stops, it
 falls over.
 
 fr5-(top[[5]])
 fr5
 xmlSApply(fr5,function(x) xmlSApply(x,xmlValue))
 df-as.data.frame(xmlSApply(fr5,function(x) xmlSApply(x,xmlValue)))
 df
 
 Now I guess there is an irregularity in the xml causing this.  I gather
 from other posts I should use Xpath functionality to interrogate this
 section of the data. I've tried reverse engineering some of these
 commands I've seen in solutions to irregular xml problems on other forums
 but not got to what I want. I'm not really up on xml, but I am assuming
 it is something to do with the JourneySectionPattern id= part of
 the file is what is causing the problem?  This looks like there should be
 a field called JouneyPattern ID (only I guess without the space) and then
 the ID code as the actual field contents.
 
 So my question is, is there a way to parse this table correctly and
 output the resulting df as a csv?
 
 All help gratefully recieved.  BTW the link to the searhable r-help
 archives seems to be broken.
 
 GavinR
 
 
 
 
 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


FREE 3D MARINE AQUARIUM SCREENSAVER - Watch dolphins, sharks  orcas on your 
desktop!

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.