Hi Rgonauts,

I am trying to parse some xml files of transport data using the TransExchange 
format (in this case bus routing information) and obtain some data.frames for 
onward processing for a GIS related task.  Ideally I need them in .csv files.

Each file (an example is attached) contains up to 8 tables of information about 
transport operators and routing information.  I have uploaded an example that 
contains all 8.  In fact I have some hundreds of similar files that will need 
processing. So when I've solved this I will need to be able to loop through a 
bunch of them.

I'm new to handling xml data and to the xml package so I don't really know what 
I'm doing, this is my first stab at using the xml package.
So far the workflow goes something like this.

#get the file
doc=xmlTreeParse("cen_18-23-D-y11-2.xml")
top=xmlRoot(doc)

#look at the names
top=xmlRoot(doc)

#pick one of them to use, in this case the forth one, 'routes', a table of 
information about this particular bus route. using some code from another forum 
post, I can get a data.frame with the info i need in it.  OK I need to do some 
reshaping but I can handle that later

fr4<-(top[[4]])
fr4
xmlSApply(fr4,function(x) xmlSApply(x,xmlValue))
df<-as.data.frame(xmlSApply(fr4,function(x) xmlSApply(x,xmlValue)))
df

#this works but when I try it with another table, the fifth one say, that 
captures information about the parts of the journey between stops, it falls 
over.

fr5<-(top[[5]])
fr5
xmlSApply(fr5,function(x) xmlSApply(x,xmlValue))
df<-as.data.frame(xmlSApply(fr5,function(x) xmlSApply(x,xmlValue)))
df

Now I guess there is an irregularity in the xml causing this.  I gather from 
other posts I should use Xpath functionality to interrogate this section of the 
data. I've tried reverse engineering some of these commands I've seen in 
solutions to irregular xml problems on other forums but not got to what I want. 
I'm not really up on xml, but I am assuming it is something to do with the 
<JourneySectionPattern id=****> part of the file is what is causing the 
problem?  This looks like there should be a field called JouneyPattern ID (only 
I guess without the space) and then the ID code as the actual field contents.

So my question is, is there a way to parse this table correctly and output the 
resulting df as a csv?

All help gratefully recieved.  BTW the link to the searhable r-help archives 
seems to be broken. 

GavinR




______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to