Hi,

You should keep replies on the list - you never know when someone will swoop in 
with the right answer to make your life easier.

Below is a simple example that uses xpath syntax to identify (and in this case 
retrieve) children that match your xpath expression.  xpath epxressions are 
sort of like /a/directory/structure/description so you can visualize elements 
of XML like nested folders or subdirectories.

Hopefully this will get you started.  A lot more on xpath here 
http://www.w3schools.com/xml/xml_xpath.asp  There are other extraction tools in 
xml2 - just type ?xml2 at the command prompt to see more.

Since you have more deeply nested elements you'll need to play with this a bit 
first.

library(xml2)
uri = 'http://www.w3schools.com/xml/simple.xml'
x = read_xml(uri)

name_nodes = xml_find_all(x, "//name")
name = xml_text(name_nodes)

price_nodes = xml_find_all(x, "//price")
price = xml_text(price_nodes)

calories_nodes = xml_find_all(x, "//calories")
calories = xml_double(calories_nodes)

X = data.frame(name, price, calories, stringsAsFactors = FALSE)
write.csv(X, file = 'foo.csv')

Cheers,
Ben

> On Jan 4, 2017, at 2:13 PM, Andrew Lachance <alach...@bates.edu> wrote:
> 
> Hello Ben,
> 
> Thank you for the advice. I am extremely new to any sort of coding so I have 
> learned a lot already. Essentially, I was given an XML file and was told to 
> convert all of it to a csv so that it could be uploaded into a database. 
> Unfortunately the information I am working with is medical information and 
> can't really share it. I initially tried to convert it using online programs, 
> however that ended up with a large amount of blank spaces that wasn't useful 
> for uploading into the database.
> 
> So essentially, my goal is to parse all the data in the XML to a coherent, 
> succinct CSV that could be uploaded. In the document, there are 361 patient 
> files with 13 subcategories for each patient which further branches off to 
> around 150 categories total. Since I am so new, I have been having a hard 
> time seeing the bigger picture or knowing if there are any intermediary steps 
> that will prevent all the blank spaces that the online conversion programs 
> created.
> 
> I will look through the information on the xml2 package. Any advice or 
> recommendations would be greatly appreciated as I have felt fairly stuck. 
> Once again, thank you very much for your help.
> 
> Best,
> Andrew
> 
> On Tue, Jan 3, 2017 at 2:29 PM, Ben Tupper <btup...@bigelow.org 
> <mailto:btup...@bigelow.org>> wrote:
> Hi,
> 
> It's hard to know what to advise - much depends upon the XML data you have 
> and what you want to extract from it. Without knowing about those two things 
> there is little anyone could do to help.  Can you post to the internet a to 
> example data and provide the link here?  Then state explicitly what you want 
> to have in hand at the end.
> 
> If you are just starting out I suggest that you try xml2 package ( 
> https://cran.r-project.org/web/packages/xml2/ 
> <https://cran.r-project.org/web/packages/xml2/> ) rather than XML package ( 
> https://cran.r-project.org/web/packages/XML/ 
> <https://cran.r-project.org/web/packages/XML/> ). I have been using it much 
> more since the authors added the ability to create xml nodes (rather than 
> just extracting data from existing xml nodes).
> 
> Cheers,
> Ben
> 
> P.S.  Hello to my niece Olivia S on the Bates EMS team.
> 
> 
> > On Jan 3, 2017, at 11:27 AM, Andrew Lachance <alach...@bates.edu 
> > <mailto:alach...@bates.edu>> wrote:
> >
> > up votdown votefavorite
> > <http://stats.stackexchange.com/questions/254328/how-to-convert-a-large-xml-file-to-a-csv-file-using-r?noredirect=1#
> >  
> > <http://stats.stackexchange.com/questions/254328/how-to-convert-a-large-xml-file-to-a-csv-file-using-r?noredirect=1#>>
> >
> > I am completely new to R and have tried to use several functions within the
> > xml packages to convert an XML to a csv and have had little success. Since
> > I am so new, I am not sure what the necessary steps are to complete this
> > conversion without a lot of NA.
> >
> > --
> > Andrew D. Lachance
> > Chief of Service, Bates Emergency Medical Service
> > Residence Coordinator, Hopkins House
> > Bates College Class of 2017
> > alach...@bates.edu <mailto:alach...@bates.edu> <wcur...@bates.edu 
> > <mailto:wcur...@bates.edu>>
> > (207) 620-4854
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help@r-project.org <mailto:R-help@r-project.org> mailing list -- To 
> > UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help 
> > <https://stat.ethz.ch/mailman/listinfo/r-help>
> > PLEASE do read the posting guide 
> > http://www.R-project.org/posting-guide.html 
> > <http://www.r-project.org/posting-guide.html>
> > and provide commented, minimal, self-contained, reproducible code.
> 
> Ben Tupper
> Bigelow Laboratory for Ocean Sciences
> 60 Bigelow Drive, P.O. Box 380
> East Boothbay, Maine 04544
> http://www.bigelow.org <http://www.bigelow.org/>
> 
> 
> 
> 
> 
> 
> -- 
> Andrew D. Lachance
> Chief of Service, Bates Emergency Medical Service
> Residence Coordinator, Hopkins House
> Bates College Class of 2017
> alach...@bates.edu <mailto:wcur...@bates.edu>
> (207) 620-4854

Ben Tupper
Bigelow Laboratory for Ocean Sciences
60 Bigelow Drive, P.O. Box 380
East Boothbay, Maine 04544
http://www.bigelow.org




        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to