SOLVED: (mostly) So I'll post this here in case it helps someone in the future.
fileName <- '001.xml' doc <- xmlTreeParse(fileName, handlers=list("comment"=function(x,...){NULL}), asTree = TRUE) root <- xmlRoot(doc) Connected_To <- xmlToDataFrame(getNodeSet(root, '//ccd:el-e3657392-8e26-42a4-a996-258f9e645c46',"ccd")) Must use xmlTreeParse. Connected_To is a dataframe with the element names as colnames and the rows are for each instance in the file. I still get an warning: Namespace prefix ccd on el-e3657392-8e26-42a4-a996-258f9e645c46 is not defined I haven't quite found the solution for this namespace problem. I have tried a few functions/examples from the XML package but they usually return a list instead of a character vector. But the result dataframe from the XML is correct and that is my initial goal. Cheers, Tim On Thu, Apr 17, 2014 at 3:54 PM, Timothy W. Cook <t...@mlhim.org> wrote: > Apologies, I forgot to add details: > platform x86_64-pc-linux-gnu > arch x86_64 > os linux-gnu > system x86_64, linux-gnu > status > major 3 > minor 0.1 > year 2013 > month 05 > day 16 > svn rev 62743 > language R > version.string R version 3.0.1 (2013-05-16) > nickname Good Sport > > Executed inside R Studio Version 0.98.501 > > > > On Thu, Apr 17, 2014 at 12:35 PM, Timothy W. Cook <t...@mlhim.org> wrote: > >> R newbie, experienced software developer. >> >> I have a bit of confusion regarding using this function. See the XML >> fragment at the end of the post. >> >> This works as far as retrieving the nodeset: >> >> > fileName <- '/home/tim/MLHIM/git/EpiS3/test_ccd/inst/examples/001.xml' >> >> > doc <- xmlInternalTreeParse(fileName)> nodes <- >> > getNodeSet(doc,'//ccd:el-e3657392-8e26-42a4-a996-258f9e645c46')> >> > Connected_To=xmlToDataFrame(nodes) >> >> >> The result is: >> >> > str(Connected_To)'data.frame': 1 obs. of 7 variables: >> $ comment : Factor w/ 1 level " DvURI ": 1 >> $ data-name : Factor w/ 1 level "Connected To": 1 >> $ valid-time-begin: Factor w/ 1 level " Use any subtype of ExceptionalValue >> here when a value is missing": 1 >> $ valid-time-end : Factor w/ 1 level "2020-06-07T01:31:49Z": 1 >> $ DvURI-dv : Factor w/ 1 level "2009-01-20T08:40:53Z": 1 >> $ relation : Factor w/ 1 level "http://www.ccdgen.com": 1 >> $ NA : Factor w/ 1 level "connected to": 1 >> >> >> The line: >> >> $ valid-time-begin: Factor w/ 1 level " Use any subtype of ExceptionalValue >> here when a value is missing": 1 >> >> has a value that is a comment that occurs before the element. Then each >> value is shifted by one place with an extra one at the end. >> >> My desired solution would be to remove commenteds from the parsed tree. >> So I did this: >> >> > doc <- xmlInternalTreeParse(fileName, >> > handlers=list("comment"=function(x,...){NULL}), useInternalNodes = TRUE) >> >> But now when I attempt to get the nodes I get an error. >> >> > nodes <- >> > getNodeSet(doc,'//ccd:el-e3657392-8e26-42a4-a996-258f9e645c46')Error in >> > UseMethod("xpathApply") : >> no applicable method for 'xpathApply' applied to an object of class "list" >> >> >> I am not sure what the error is telling me nor how to fix it. >> >> >> All suggestions are welcome and appreciated. Maybe I should be using a >> different approach to solving the issue with extracting the data.frame? >> >> Regads, >> Tim >> >> >> >> >> XML fragment: >> <ccd:el-e3657392-8e26-42a4-a996-258f9e645c46> >> <!-- DvURI --> >> <data-name>Connected To</data-name> >> <!-- Use any subtype of ExceptionalValue here when a value is >> missing--> >> <valid-time-begin>2020-06-07T01:31:49Z</valid-time-begin> >> <valid-time-end>2009-01-20T08:40:53Z</valid-time-end> >> <DvURI-dv>http://www.ccdgen.com</DvURI-dv> >> <relation>connected to</relation> >> </ccd:el-e3657392-8e26-42a4-a996-258f9e645c46> >> >> -- >> MLHIM VIP Signup: http://goo.gl/22B0U >> ============================================ >> Timothy Cook, MSc +55 21 994711995 >> MLHIM http://www.mlhim.org >> Like Us on FB: https://www.facebook.com/mlhim2 >> Circle us on G+: http://goo.gl/44EV5 >> Google Scholar: http://goo.gl/MMZ1o >> LinkedIn Profile:http://www.linkedin.com/in/timothywaynecook >> > > > > -- > MLHIM VIP Signup: http://goo.gl/22B0U > ============================================ > Timothy Cook, MSc +55 21 994711995 > MLHIM http://www.mlhim.org > Like Us on FB: https://www.facebook.com/mlhim2 > Circle us on G+: http://goo.gl/44EV5 > Google Scholar: http://goo.gl/MMZ1o > LinkedIn Profile:http://www.linkedin.com/in/timothywaynecook > -- ============================================ Timothy Cook LinkedIn Profile:http://www.linkedin.com/in/timothywaynecook MLHIM http://www.mlhim.org [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.