Re: [R] XML getNodeSet

Timothy W. Cook Sat, 19 Apr 2014 07:02:23 -0700

SOLVED: (mostly)

So I'll post this here in case it helps someone in the future.


fileName <- '001.xml'

doc <- xmlTreeParse(fileName,
handlers=list("comment"=function(x,...){NULL}), asTree = TRUE)
root <- xmlRoot(doc)

Connected_To <- xmlToDataFrame(getNodeSet(root,
'//ccd:el-e3657392-8e26-42a4-a996-258f9e645c46',"ccd"))

Must use xmlTreeParse.

Connected_To is a dataframe with the element names as colnames and the rows
are for each instance in the file.

I still get an warning:

Namespace prefix ccd on el-e3657392-8e26-42a4-a996-258f9e645c46 is not defined


I haven't quite found the solution for this namespace problem. I have tried
a few functions/examples from the XML package but they usually return a
list instead of a character vector.  But the result dataframe from the XML
is correct and that is my initial goal.

Cheers,
Tim



On Thu, Apr 17, 2014 at 3:54 PM, Timothy W. Cook <t...@mlhim.org> wrote:

> Apologies, I forgot to add details:
> platform       x86_64-pc-linux-gnu
> arch           x86_64
> os             linux-gnu
> system         x86_64, linux-gnu
> status
> major          3
> minor          0.1
> year           2013
> month          05
> day            16
> svn rev        62743
> language       R
> version.string R version 3.0.1 (2013-05-16)
> nickname       Good Sport
>
> Executed inside R Studio Version 0.98.501
>
>
>
> On Thu, Apr 17, 2014 at 12:35 PM, Timothy W. Cook <t...@mlhim.org> wrote:
>
>> R newbie, experienced software developer.
>>
>> I have a bit of confusion regarding using this function. See the XML
>> fragment at the end of the post.
>>
>> This works as far as retrieving the nodeset:
>>
>> > fileName <- '/home/tim/MLHIM/git/EpiS3/test_ccd/inst/examples/001.xml'
>>
>> > doc <- xmlInternalTreeParse(fileName)> nodes <- 
>> > getNodeSet(doc,'//ccd:el-e3657392-8e26-42a4-a996-258f9e645c46')> 
>> > Connected_To=xmlToDataFrame(nodes)
>>
>>
>> The result is:
>>
>> > str(Connected_To)'data.frame':     1 obs. of  7 variables:
>>  $ comment         : Factor w/ 1 level " DvURI ": 1
>>  $ data-name       : Factor w/ 1 level "Connected To": 1
>>  $ valid-time-begin: Factor w/ 1 level " Use any subtype of ExceptionalValue 
>> here when a value is missing": 1
>>  $ valid-time-end  : Factor w/ 1 level "2020-06-07T01:31:49Z": 1
>>  $ DvURI-dv        : Factor w/ 1 level "2009-01-20T08:40:53Z": 1
>>  $ relation        : Factor w/ 1 level "http://www.ccdgen.com": 1
>>  $ NA              : Factor w/ 1 level "connected to": 1
>>
>>
>> The line:
>>
>> $ valid-time-begin: Factor w/ 1 level " Use any subtype of ExceptionalValue 
>> here when a value is missing": 1
>>
>>  has a value that is a comment that occurs before the element. Then each
>> value is shifted by one place with an extra one at the end.
>>
>> My desired solution would be to remove commenteds from the parsed tree.
>>  So I did this:
>>
>> > doc <- xmlInternalTreeParse(fileName, 
>> > handlers=list("comment"=function(x,...){NULL}), useInternalNodes = TRUE)
>>
>> But now when I attempt to get the nodes I get an error.
>>
>> > nodes <- 
>> > getNodeSet(doc,'//ccd:el-e3657392-8e26-42a4-a996-258f9e645c46')Error in 
>> > UseMethod("xpathApply") :
>>   no applicable method for 'xpathApply' applied to an object of class "list"
>>
>>
>> I am not sure what the error is telling me nor how to fix it.
>>
>>
>> All suggestions are welcome and appreciated.   Maybe I should be using a
>> different approach to solving the issue with extracting the data.frame?
>>
>> Regads,
>> Tim
>>
>>
>>
>>
>> XML fragment:
>> <ccd:el-e3657392-8e26-42a4-a996-258f9e645c46>
>>       <!-- DvURI -->
>>       <data-name>Connected To</data-name>
>>       <!-- Use any subtype of ExceptionalValue here when a value is
>> missing-->
>>       <valid-time-begin>2020-06-07T01:31:49Z</valid-time-begin>
>>       <valid-time-end>2009-01-20T08:40:53Z</valid-time-end>
>>       <DvURI-dv>http://www.ccdgen.com</DvURI-dv>
>>       <relation>connected to</relation>
>>     </ccd:el-e3657392-8e26-42a4-a996-258f9e645c46>
>>
>> --
>> MLHIM VIP Signup: http://goo.gl/22B0U
>> ============================================
>> Timothy Cook, MSc           +55 21 994711995
>> MLHIM http://www.mlhim.org
>> Like Us on FB: https://www.facebook.com/mlhim2
>> Circle us on G+: http://goo.gl/44EV5
>> Google Scholar: http://goo.gl/MMZ1o
>> LinkedIn Profile:http://www.linkedin.com/in/timothywaynecook
>>
>
>
>
> --
> MLHIM VIP Signup: http://goo.gl/22B0U
> ============================================
> Timothy Cook, MSc           +55 21 994711995
> MLHIM http://www.mlhim.org
> Like Us on FB: https://www.facebook.com/mlhim2
> Circle us on G+: http://goo.gl/44EV5
> Google Scholar: http://goo.gl/MMZ1o
> LinkedIn Profile:http://www.linkedin.com/in/timothywaynecook
>



-- 

============================================
Timothy Cook
LinkedIn Profile:http://www.linkedin.com/in/timothywaynecook
MLHIM http://www.mlhim.org

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] XML getNodeSet

Reply via email to